Stream Processing vs Streaming Apps: What’s the Difference?
by Brad Johnson, on Nov 5, 2018 3:34:24 AM
The world isn’t slowing down. Increasingly, new streaming data sources are being integrated into business applications, coinciding with a rapidly growing demand for real-time applications. In order to achieve new efficiencies and react more quickly to streaming information, applications must process higher volumes of data more quickly than ever before. Recently, ZDnet observed that “streaming is one of the top trends we've been keeping up with.” And as the software world continues to bend towards real-time, stream processing has become critical to most enterprise applications.
Open source stream processors, like Apache Spark and Apache Flink, continue to see increased adoption across industries. And while there’s lot of buzz around stream processing, what businesses really want are streaming applications. That may sound like a minor semantic difference, but in reality there are major differences between the two.
What is stream processing?
The makers of Apache Flink, dataArtisans, describe stream processing as “the processing of data in motion, or in other words, computing on data directly as it is produced or received.” In order to accomplish this, stream processors typically use in-memory databases to provide for stateful processing on streaming data. This provides an efficiency improvement for processing real-time data streams in parallel, over other open source solutions like Hadoop and other MapReduce-style processors. But a stateful stream processor does not make a stateful application. Besides stream processors, distributed applications are comprised of other components, such as message brokers (e.g. Apache Kafka) and databases (e.g. Apache Cassandra). For example, I’ve highlighted the common SMACK stack model below:
The SMACK stack may have a stateful stream processor, but applications built using a SMACK stack and REST APIs are still stateless. Original source: DZone.
Ideally, distributed applications using streaming data would be stateful everywhere. Why? Because in a fully stateful distributed application, data can remain in-motion all the way to the end user. However, when only the stream processor is stateful and the rest of the application is stateless, data must be broken into batches and shipped via REST APIs to databases or other application services. As each specialized component of an application stack (broker, application server, processor, etc.) is optimized to be internally efficient, integrating these disparate components requires buffers, queues, timers, and other mechanisms which create bottlenecks between components.
These bottlenecks may be manageable at the prototype phase, but in production, integration points between components make open-source application stacks rigid and difficult to scale. Furthermore, as applications process higher and higher volumes of data, they become increasingly susceptible to dreaded “buffer bloat.” Regardless of whether a stream processor is stateful, if the rest of the application is REST-based, then the entire application is stateless. Considering the end application, data at REST isn’t data-in-motion at all.
What are streaming apps?
Stream processors make it possible to do stateful analytics, but streaming applications require more than just analytics. Streaming applications require intelligence. Intelligence is what you get when you combine analytics with self and situational awareness. Intelligence enables streaming apps to be self-aware, so they can continuously evaluate new data based based on current application state. Unlike REST-based applications, streaming applications know their own state continuously, without ever having to ask a database. This enables data to remain in-motion throughout the entire distributed application.
Because they’re intelligent, streaming applications can talk amongst themselves. This enables all types of automation and collaboration between different nodes in a distributed system. This is contrasted with REST APIs, which can’t coordinate. REST-based applications are always going to be dependent on an application server or central datastore to keep them informed.
What does a streaming application look like? Consider the SWIM Smart City application, which analyzes live data from connected traffic infrastructure in Palo Alto, CA. This application consists of intelligent web agents representing each intersection. These web agents observe local traffic behavior and subscribe to similar information from their neighboring intersections to inform deep neural networks (DNNs). The DNNs predict when lights will change and future traffic behavior. Click on the intersection of University and Middlefield (blue dot) if you want to see for yourself.
How do you build a streaming app?
In our opinion, here are the key attributes of a streaming application architecture:
- Persistence without a database
- Real-time without a message broker
- Scheduling without a job manager
- APIs that don’t rely on an app server
- Scalability without load balancers
- Compute which scales without limits
If you’re curious about how we incorporate those attributes into our streaming applications, connect with us at www.swim.ai/developers.
Find out more about SWIM.AI and how we used an agent-based architecture to power cost-effective, scalable streaming applications at www.swim.ai.