Building a Smart City? Have a Strategy for Streaming Data
by Brad Johnson, on Dec 7, 2018 12:52:20 PM
How "smart" are today's Smart Cities, really?
The buzz around Smart Cities has been going on for over a decade and the headlines keep coming. But how many “smart” cities today actually feel that much smarter? As most cities begin to update and connect their city infrastructure, the standard approach has been to store all that new data in a centralized datacenter and analyze it using big data techniques. Which is great! But once all your city data is stored in a database, what can you really do with it?
Big data is ideal for situations where you’re analyzing historical data. If you’re doing city planning, trying to estimate impact of events or construction projects, plan public transit routes then Big Data can help you find the answers you need. But the Smart City visions makes promises beyond analyzing what happened in the past. Smart Cities are about what’s going on right now, and using that information to empower citizens, city managers, and public servants to make decisions in real-time. As data streams from connected city infrastructure, citizens’ mobile devices, emergency vehicles, and a variety of other sources, cities need a way to analyze this data in real-time. In order to generate relevant insights based on the real-time context provided by streaming data, it’s necessary to analyze data as it’s generated.
Streaming data needs to be treated differently
When you’re trying to act on data as soon as possible, it doesn’t make sense to store it in a database first. Relying on a database as the single-source of truth involves multiple, unnecessary trips over the network and incurring latency at each hop. This slows the discover of time-sensitive insights, and physically separates an app’s “intelligence” from where automation or control needs to occur (at the edge). On premise databases don’t necessarily solve this issue either. While the data is stored physically closer to the edge, if the application is stateless (meaning state is managed centrally by a database), it must still ask the database prior to taking any action.
Ideally, you want a software platform that processes data as it flows, analyzes and transforms that data into a reduced format, acts on discovered insights and keeps only what matters for further historical analysis. Stateless big data architectures are a mismatch for applications which must analyze and act on data on-the-fly. Even applications which utilize stateful stream processors, like Apache Flink and Apache Spark Streaming, are still reliant on a central database to provide state throughout the application. In order to ensure timely discovery and delivery of insights from streaming data, application architectures which provide for statefulness throughout will achieve a significant performance and efficiency advantages.
In reality, maximizing value from Smart City data sources will involve applications combining elements of stateless big data and stateful streaming data analysis. However, assuming that a one-size-fits-all big data strategy will also work for streaming data is a recipe for failure. When dealing with high volumes of continuous data from streaming sources, big data architectures are expensive to maintain and difficult to scale. The mismatch between batch data analysis and streaming data limits the potential benefits of real-time insights, requires high operating costs, and ultimately fails to the deliver on the promise of a building a smarter city. As Smart Cities continue to bring online new streaming data sources from connected citizens, vehicles and infrastructure, it’s increasingly critical for city CTOs and IT managers to have technology strategies for building both big data and streaming data systems.
Find out more about SWIM.AI and how we used a stateful agent-based architecture to power cost-effective, scalable streaming Smart City applications at www.swim.ai.