Use Databases for Insights - Not Data
by Simon Crosby, on Jun 2, 2021 9:00:00 AM
For a growing class of business-critical applications, what’s in an application database is likely to be out of sync with the real world - insights are computed too late to be useful for automation and real-time responses. Continuous intelligence aims to fix this using an "analyze then store" architecture.
There is a sea change in architectures for applications that need to process streaming data to gain continuous intelligence. Ten years ago enterprises thought they could store everything and analyze the data later – the big-data approach. It was a reasonable assumption given the costs of storage, networking and compute at the time, and given the data rates of sources in the environment. But those assumptions have proved to be wrong:
- The number of connected sources is growing exponentially
- Bandwidth at the edge is growing courtesy of 5G, and devices will readily use it
- Insights are needed to drive automated responses, so they have to be computed in real-time so that applications stay in sync with the environments they respond to
Many legacy devices – including traffic infrastructure, for example – generate vast amounts of data. By way of example, the traffic infrastructure for the city of Palo Alto CA generates more data than Twitter’s Firehose. The infrastructure for the city of Las Vegas streams more than ten times that! If we need to deliver a smart city application that allows delivery vehicles to use granular predictions to ensure that they use the best routes, the volumes of data are astounding. Moreover, the need for real-time processing is obvious: delivering out-of-date predictions is useless.
but more bandwidth at the edge, while important, doesn’t solve the problem. What’s needed is real-time, data driven computation in which the analysis and responses are driven by the arrival of updates from the real-world, and computation at CPU and memory speeds without storing data first. We call this an ”analyze, react and then store” architecture. Predictions need to be continuously computed and at all times the application has to have the latest answer – not something computed as part of a batch run – a ride-share vehicle needs continuous predictions of traffic in order to adapt to use the best route. From this observation we can conclude that a new category of applications – continuous intelligence applications - exists for which:
- The application must always have a timely response (eg: within 100ms)
- Therefore store-then-analyze is not an appropriate architecture
Continuous intelligence demands stateful in-memory processing to optimize performance and to enable real-time responses. It embraces event streaming and other infrastructure patterns that have emerged recently but focuses on the application layer functions needed to develop and operate stateful applications that consume streaming events.
Although modern databases can store streaming data for later analysis, update relational tables or modify graphs, continuous intelligence drives analysis from the arrival of data — using an “analyze, react, then store” architecture that builds and executes a live computational model from streaming data. This is a big change from the architectural assumptions of the database era, or indeed from the cloud-centric “stateless REST API plus stateful database” pattern. Moving state into memory allows fast analysis.
In SwimOS - our Apache 2.0 licensed platform for continuous intelligence -an application is an automatically created, distributed graph of actors called Web Agents – in effect smart, analytical digital twins of data sources - that each concurrently cleans and processes streaming data from a single source and analyzes its resulting state changes, in the context of static and other dynamically evolving state.
Web Agents dynamically link to related Web Agents. Links build an in-memory graph in which each vertex is a Web Agent that concurrently and in real-time analyzes both its own state and the states of Web Agents to which it is linked. Links are made and broken dynamically based on changing real-world relationships between data sources, so the graph is continually in flux. In the physical world many relationships can be inferred from a geospatial context, but links can also express simpler notions like containment or even correlation. Web Agents can analyze, learn and predict from their own states and those of linked Web Agents. And their insights, computed concurrently in the graph, stream in real-time to applications and GUIs.
Using a dynamic graph of in-memory Web Agents that compute in real-time on data from their real-world twins yields orders of magnitude greater performance than database-coupled applications. Web Agent based object oriented applications can process data on-the-fly, delivering results before storing raw data. They take the view that the real world, and its dynamic changes, offers the truest view of its own state, and that by computing at memory speed close to the sources of raw data, they can deliver a new class of applications that are no more than the blink of an eye out of sync with the real world – whether analyzing, learning or predicting what’s next.
To get started with continuous intelligence, developers and architects can prototype applications using open source tools: Apache Kafka or Apache Pulsar for event streaming, and Apache 2.0 licensed SwimOS as the Web Agent runtime application platform. SwimOS can easily integrate with stream analysis tools you may have used on other projects, such as Apache Beam, Apache Flink, or Apache Spark.