Beyond Serverless: Why We Need A Stateful Data Fabric
by Brad Johnson, on Jun 13, 2019 12:06:10 PM
The first iPhone was released on June 29, 2007. And while the advent of the iPhone was hardly the only catalyst of the smartphone revolution, I consider this to be as good a birthdate as any for one of humankind’s most consequential innovations. Since then, smartphones have been adopted faster than any other disruptive technology in modern history. But I’m not actually here to write about smartphones, because I think there was an even more important development that day in 2007. That development that changed the world? It was the announcement of the iOS operating system.
In my opinion, iOS changed how humans fundamentally interact with technology, in ways that will far outlast the smartphone era. What I mean is that iOS brought apps into the mainstream. Don’t get me wrong, we’ve been calling application software “apps” since at least 1981. And this didn’t just happen overnight. Until 2010, Symbian was the world's most widely used smartphone operating system. But iOS crystallized the modern notion of how users engage with apps, and made them accessible to users with even the most limited technical ability. Like written language, the printing press, and telecommunications before; apps have changed how we communicate with the world.
From Microservices to Serverless
*That seems like an awfully long lead-in for an article about data fabrics…*
I know, I know... Thanks for sticking with me. The reason all this is important is because while iOS changed users’ relationships with apps, it also changed our relationships with application infrastructures. Instead of shipping bytes of static data from one machine to another, apps now needed to interact with dynamic, continuously changing datasets. Whether data was being generated by mobiles users, sensors, or devices; traditional SQL database architectures were soon stretched to their limits by this new generation of mobile apps. Apps were now expected to be reactive, real-time, and highly available while dealing with unprecedented volumes of data being created. A new generation of specialized data processing and networking software would have to be created as the foundation for this new generation of apps.
Around this time, we saw the rise of microservices architectures and actor-based systems like Akka. We also saw the dawn of AWS and public cloud services. A new class of social media apps created the need for real-time databases like Apache Cassandra and performant message brokers like Apache Kafka. Today, microservices have become a ubiquitous part of enterprise architectures. And we’re starting to see even newer paradigms like serverless.
While serverless seeks to decouple an app’s operations from its infrastructure, this is only a first step. Mike Roberts defines two primary implementations of serverless, Backend-as-a-Service (BaaS) and Functions-as-a-Service (FaaS). In the former, Roberts explains BaaS applications “significantly or fully incorporate third-party, cloud-hosted applications and services, to manage server-side logic and state.” Most of these are rich front end applications, where a relatively inflexible server-side architecture is perfectly cromulent and can be outsourced to multiple vendors. For the latter, FaaS apps are “run in stateless compute containers that are event-triggered, ephemeral (may only last for one invocation), and fully managed by a third party.”
This model works great for ephemeral tasks, but what about continuous processes with continuously variable event streams? These streaming apps would be better served by a stateful network of serverless functions.
One way to accomplish this is by creating a stateful data fabric.
So What Exactly is a Data Fabric?
According to Isabelle Nuage, a data fabric is “is a single, unified platform for data integration and management that enables you to manage all data within a single environment and accelerate digital transformation.” Data fabrics must integrate with external solutions, manage data across all environments (multi-cloud and on-premises), support batch, real-time and big data use cases, and provide APIs to new data transformations.
Data fabrics weave a cohesive whole from multiple disparate apps and microservices, many of which are hosted by third-party vendors. As such, a data fabric should reduce the effort required to integrate new microservices or platforms, while ensuring existing systems continue to operate as normal. In this regard, data fabrics are the ultimate middleware which all other systems orbit around. Simultaneously, data fabrics are the medium for communication across complex app architectures. Whether communicating via stateless REST APIs or stateful streams, data fabrics serve to ensure all microservices have the latest relevant state.
For Stateful Apps, It's All About the APIs
Stateful microservices alone don't make a stateful data fabric. That's because there's a key difference between a collection of independent stateful microservices and a cohesive stateful system. Many serverless architectures today can claim to be stateful, in the sense that stateful data processors, such as Apache Flink, can be deployed via stateless containers. However, if the primary exchange of data is stateless, such as via REST APIs, then the application itself is primarily stateless.
Deploying stateful microservices via stateless containers is not the same as a data fabric being stateful. A stateful data fabric weaves a network of stateful, streaming links between multiple persistent microservices. This allows for the creation of truly stateful apps; microservices can continuously observe data streams for critical events and peers are continuously subscribed to receive relevant updates. In other words, a stateful data fabric enables real-time, peer-to-peer (P2P) computation instead of a stateless hub-and-spoke architecture oriented around a central datastore.
The open source Swim platform is an example of a fully stateful data fabric, which leverages stateful streaming APIs to communicate across microservices. Stateful microservices (called Web Agents) can perform real-time transformations as data flows or be combined with others to create real-time aggregations. Due to the real-time nature of streaming data, each Swim microservice is guaranteed eventual state consistency with peers. However, Swim can also be configured for other delivery guarantees. Because state consistency is automatically managed by the fabric, developers are no longer on the hook for managing state across isolated microservices via REST API.
Share your thoughts about stateful applications below and let us know what you're building using the open-source Swim platform.
You can get started with Swim here and make sure to STAR us out on GitHub.