Why Smarter Databases Can’t Deliver Continuous Intelligence
by Simon Crosby, on May 18, 2021 11:03:50 AM
Organizations are drowning in streams of data and it's only going to get worse. Sure, lots of it can be stored, but most of it is only fleetingly useful Traditional big-data “store-then-analyze” architectures struggle to find valuable insights in time. High data volumes and geographically distributed sources make centralized architectures challenging and expensive.
Can databases keep up? There are hundreds to choose from. All aim to solve hard infrastructure problems under the hood. A single API call can trigger a snapshot or transaction roll-back. They offer features for analysis - even machine learning - caching, and clustered and distributed operation. They are resilient to faults, and increasingly secure. What’s not to love? But, perhaps a more useful question is: Can databases help enterprises seeking continuous intelligence from their data?
It’s the architectural assumption of databases as separate state stores that is limiting: Databases are updated by clients and queried by users, but they don’t drive (re-)computation of insights when new data arrives or transform data to a stream of insights. That’s not enough for continuous intelligence applications that need an answer the moment data arrives. Databases don’t run applications and can’t make sense of data.
Continuous intelligence aims to solve this problem. It delivers insights on-the-fly using streaming data to dynamically build models and drive analysis, learning and prediction - often for automation.
What is Continuous Intelligence?
The goal of continuous intelligence is to always have the answer, enabling an instant response. It is achieved by:
- Data driven computation: Analysis is driven by arriving data (as opposed to queries or batch analysis) for three reasons: Users need automated responses in real-time; data streams are boundless; and real-world data is only ephemerally useful.
- Stateful, contextual analysis: Relationships such as containment, proximity, or even analytical relationships like correlation or predictions are vital for applications that reason about the collective meaning of events. Relationships are fluid and are continuously re-evaluated.
- Time is fundamental: Since data drives computation and results are available immediately, the concept of time is “built in” – insights stream continuously to applications, storage, and users, and can be used to drive automation.
Each new event triggers re-evaluation of all dependencies. For example: A smart-city application might alert an inspector to stop any truck with bad braking behavior when it is predicted to be on the same road as the inspector in 2 minutes. Since there is no point telling an inspector to stop a truck that has already passed, responses must be real-time and situationally relevant: Only an inspector on the same street as and ahead of a flagged truck should be alerted. The application must deliver results for all trucks and all inspectors in the city, in real-time.
Fleeting relationships like that between a truck and inspector, require complex analysis that moment-by-moment uses the positions and velocities of each truck to evaluate “bad braking”, and if needed predict its route and alert an inspector. These relationships cannot be represented in any traditional database. For continuously evolving real-world systems the idea of a database as a repository of “truth” is inappropriate because the current state of a source is less useful than its behavior over time:
- Distributions, trends, or statistics computed from changes over time are often more useful
- ML, regression, and other predictive tools use past behavior to predict the future
- Often real-world systems report data values that are themselves estimates or a function of time
- Complex relationships between sources can often only be found in the time domain
The behavior of the entire system over time is needed to ascertain its current likely state and predict its likely future states to extract meaning. Continuous intelligence applications deliver a continuous stream of insights and responses that result from the continuous interpretation of the joint effects of all events on a stateful system model over time. As relationships change – geospatially, mathematically, or otherwise, the set of relationships – kept in-memory as a dynamically updated graph of relationships between data sources - is continuously updated. When new data is received new insights and updated relational links are computed.
Continuous Intelligence imposes two important changes on the data processing architecture:
- Algorithms need to be adapted to deal with boundless inputs, for example using sketches and unsupervised learning.
- Analysis needs to be stateful and must include (immediate) re-evaluation of dependencies to determine cascading impacts. This in turn leads to an in-memory architecture to avoid the need for long round-trips to a database.
SwimOS: An OSS Continuous Intelligence Platform
SwimOS is a lightweight, Apache 2.0 licensed, distributed runtime for continuous intelligence applications. Each data source is represented by a stateful, concurrent actor called a Web Agent. A Web Agent is like a concurrent digital twin of a data source that processes its own data, but it can also execute complex business logic, evaluate predicates, and even learn and predict in real-time without database round-trips. It can even react - delivering responses in real-time.
Web Agents dynamically link to each other based on real-world relationships between the sources they represent, like containment or proximity or even analytical relationships, such as correlation. As Web Agents link to each other they form a fluid in-memory graph of context-rich associations between data-sources. A link allows concurrent, stateful Web Agents to share their in-memory states. Web Agents make and break links as they process events, based on continuously evaluated changes in the real-world. The resulting in-memory graph is a bit like a live “LinkedIn for things”: Web Agents, which are like ‘intelligent digital twins’ of data sources, inter-link to form a graph.
The magic of linking is that it enables each Web Agent to concurrently compute using its own state and the states of other agents to which it is linked, enabling granular contextual analysis, learning and prediction, and an active response. So, the knock-on effects of changes on the part of an entity in the real-world are immediately visible as state changes in its Web Agent - and all related (linked) Web Agents.
Web Agents also act as concurrent materialized views that are continuously re-evaluated. They can link to millions of other Web Agents to derive KPIs or aggregate views. Using the power of links, relationships, analysis and learning in an application services tier of Web Agents allows developers to easily add additional tiers of of services to an already active continuous intelligence deployment.