Big Data? No, It's About Endless Data Now.

by Simon Crosby, on Feb 19, 2020 9:43:28 AM

It's no longer about big data, it's about endless data.

Back at the beginning of this century (a long time ago in tech) some brilliant chaps escaped from Google with the recipe for taming enterprise big-data: Hadoop. If it worked for Google, it had to be what enterprises needed right? Cloudera and Hortonworks advanced the idea that the big-data lake was a platform for the future of enterprise applications.

It worked, for a while and for some apps. Both companies IPOed and soon enough the big cloud vendors started to ramp up their own big-data offerings. But as more and more devices became connected, the trickle of data turned into a flood (today they connect at a rate of about 2M/hour) and most orgs realized that the only place to run a big-data stack was the cloud. Moreover the powerful analytical software frameworks - from Power BI (Azure) to Spark and Flink - are easier to consume as services. Hortonworks was swallowed by Cloudera amid growing evidence that customer appetite for on-prem data lakes is dwindling.

But it is the “store then analyze” philosophy that is limiting, not whether it is delivered on prem or as a service. In a highly dynamic world of billions of connected devices no database-oriented application can deliver an accurate, granular, continuous synthesis of contextual insights. Applications or users that need a granular view of the current state of the world in real-time (eg: Uber) need to continually process and analyze data to know what has changed, what context is relevant, how to predict, and how to react. The problem is not data or even data analytics - it’s about how to make sense of the real world, in real-time.

Accessing stored state on disk thousands of times slower than the CPU and even distributed databases end up with performance / consistency problems. And boundless data is expensive to store, but this is not the challenge: The need is to continuously process data from millions of boundless streams, in real-time. The streams come from data sources that have dynamic contextual relationships to each other in space, time and in the data domain (eg: correlated). The goal is to deliver live insights to users and applications that need to make sense of the real world. Legacy, proprietary “complex event processing” stacks don’t cut it, because they assume a monolithic architecture to solve a single problem for a fixed set of streams from devices on fixed networks.

Mobile devices of all types, products, inventory and even people continually move through smart infrastructure, so a traditional schema-based relational model won’t fit. Proximity, containment, even similarity are dynamic because of changes in the real world and changes in the states of things. To properly model the new world we need an application architecture that facilitates a dynamically changing graph between data sources in which relationships are continually formed and broken, and where applications fuse context, state and relationships into live insights of value to users, and deliver them in real time.

Learn More

Let us know what you're building using the open source swimOS platform. You can get started with swimOS here and make sure to STAR us on GitHub. You can also learn more about the Swim Platform for enterprise apps here.

Topics:Stateful ApplicationsEdge Computingdistributed computingweb applicationsstreamingopen sourcemicroservicescloud data analyticsapache sparkdecentralizedcomplex event processingbig data