How SwimOS Enables Stateful Streaming Applications
by Brad Johnson, on Aug 28, 2019 1:48:34 PM
One of the key differentiators for all swimOS applications is statefulness. As today’s applications become more complex, distributed, and real-time, the logistical challenges of managing application state starts to outweigh the benefits traditional “stacked” architectures.
Today, if you’re building a “stateful” application you’re likely to consider things like Apache Beam, Flink, or Spark Streaming, each which offer their own flavor of stateful streaming processing. Using Kubernetes or another container orchestration solution, you may construct a microservices architecture with stateful processing as part of your strategy. That’s awesome, and likely to get you part of the way there.
Recently, Lightbend announced their open source project CloudState for running stateful workloads on the Kubernetes serverless stack. Jonas Bonér and the rest of the Lightbend team continue to do brilliant work, and I’m excited to see how they continue to develop the Akka ecosystem to support stateful applications. However, all these innovations focus on enabling stateful application architectures. But they don’t actually help you build stateful applications.
If It’s Not Streaming (The Whole Time), It’s Not Stateful
At Swim.ai, we look at stateful applications slightly differently than they’re commonly understood. The current paradigm for stateful applications is to isolate your stateful stream processor, your message broker, your application server, and your database. Data is then streamed via RESTful APIs between components or to downstream systems or “near real-time” user interfaces. However, for today’s complex, distributed, and increasingly real-time applications, this approach makes architectures inflexible and difficult to manage at scale.
RESTful APIs exist as a mechanism for transferring state between a centralized datastore and services running elsewhere in an application, whether on a device, in the cloud, at the edge, or in a browser. The problem is that today's applications generate such high volumes of data at accelerating rates. In this context, RESTful APIs are prone to buffer bloat, as RESTful state messages are received faster than they can be processed to generate a response. The more messages, the more message queues and ultimately application delays grow. This can lead to significant downtime, as message brokers get overloaded by RESTful traffic.
There’s an alternative, which is to maintain local state wherever it’s needed. This stateful paradigm ensures that application services have access to state whenever and wherever it’s needed, without waiting for a roundtrip from the database. This model works especially well for streaming applications. As streaming data flows continuously, application services can continuously update application state locally and act on data in real-time, while broadcasting updates throughout the system via streaming links. Furthermore, stateful architectures enable the real-time application of ML/AI techniques using edge computing, where stateful ML processes can run and train in real-time as streaming data is created. This approach requires a stateful data model like Recon, which REST does not support.
What About WebSockets or Other Streaming Solutions?
You'd think that using a more efficient messaging solution, like WebSockets, would solve this problem. If you can stream RESTful messages freely point-to-point, isn't that the same as streaming? Not quite. With a stateful application, data streams can be processed as they flow and state updates are broadcast as they occur. But with a RESTful application, data streams are broken into RESTful messages. Instead of getting a continuous stream of data, you receive snapshots of the data stream. In this scenario, developers are on the hook for estimating message frequency, and must predetermine how much compute power is necessary to process messages at each node. The wrong guess leads to costly downtime.
It turns out that the real challenge is not streaming data point-to-point, it's continuously processing real-time data efficiently across a distributed system. This requires an approach where stream messaging and universal state management are integrated together. A stateful approach can lead to more efficient streaming applications, while reducing downtime and data storage costs, and enabling new real-time use cases.