Searching For A Scalable Streaming API
by Brad Johnson, on Jun 21, 2019 10:34:55 AM
Let’s take a moment to appreciate the noble REST API. They’re anywhere and everywhere. REST is the lingua franca of the application world. REST APIs have been the equivalent of software duct tape since Roy T. Fielding published his doctoral thesis all the way back in 2000. Whether you’re talking about Google Docs, Facebook, Snapchat, Uber, Waze, Yelp or just about anything else, chances are there are hundreds or thousands of REST APIs are creating relationships between application services.
But there are some use cases where REST APIs aren’t such a great fit. For example, if you’re trying to pipeline streaming data. Sure there’s websockets, which is fine for a few streams. But applications need to open a new websocket connection for every URI to which they want to connect. What about apps with thousands, or millions of streams? Websockets are too inflexible, difficult to aggregate, and expensive to run at that scale. As streaming data becomes even more common, the next ubiquitous API will be built for transporting real-time streaming data, in addition to batch and historical data.
Designing the Perfect Streaming API
If REST and websockets aren’t enough, then we need a new API for today’s data-driven landscape. Let’s take a moment to imagine what the ideal, universal API would look like. What use cases must it cover and would it need to do?
Let’s start with the basics. A perfect API would:
- Work with batch (REST), historical, and real-time streaming data
- Use simple grammar to facilitate many compatible implementations
- Support asynchronous (and synchronous) use cases
- Provide a universal data type (compatible with JSON, XML, and other popular data languages)
It’s important for our new API to satisfy the streaming data use case first. This is important because unlike REST and historical datasets, data streams require stateful endpoints in order to efficiently process data in-memory, as opposed to storing streams before processing (which inevitably ends in bufferbloat).
By supporting both synchronous and asynchronous use cases, we ensure that the same API can be used for both REST and streaming data sources. Simple grammar makes our new APIs easy to understand and configure. Likewise, utilizing a universal data type ensure that the API can be used to repeatably solve a variety of use cases. There are also some other, more advanced considerations such as making design decisions about syntax, composability, polymorphism, and parsing.
Evaluating Streaming APIs Today
There are a few options if you’re looking for a streaming API today. For example, Spring Cloud Stream or gRPC may get you part of the way there. However, building the endpoints of the stream is equally challenging. A stateful application layer becomes essential to route streams to granular endpoints within server clusters.
It makes sense that utilizing a different API would have further reaching effects on how an application is architected. This is where the open source Swim platform comes in; it solves the problem of scaling a stateful application architecture that can efficiently maintain distributed real-time coherency. Swim uses the WARP streaming protocol, which is primarily a semantic model for streaming cache coherency between stateful application objects. WARP is an alternative model to RPC, and by extension, REST. The WARP coherency model can be implemented over many transports, just as the RPC model can.
Building a Stateful Streaming API
In order to construct an API using Swim, it's important to understand the primitives of Swim's architecture. The fundamental building block of Swim is a WebAgent, which is defined on the Swim Server. Each Web Agent is assigned a unique URI, so that it can be addressed by other Agents. Web Agents have Lanes which are like data objects, each being identified by a name unique to that Agent. A Swim API is a combination of the host name (where the Swim Server is running), the
nodeURI (URI of the Web Agent) and the
laneURI (lane name). It is similar to a REST endpoint except that it's a streaming data API.
Accessing a Swim API
Swim provides a client tool (CLI) to subscribe/send data to/from a Swim server. The following primarily describes accessing a Swim API, however the prerequisite is having a Swim server with Swim APIs, as described in the previous section. Swim APIs are continuously streaming data, unlike REST APIs which must pull data and are not continuous.
- Install Swim CLI
> npm i @swim/cli
- Swim API Subscription with the CLI
syncrequest is similar to an HTTP
GETmethod in REST except that it subscribes to a data stream.
To subscribe to a Swim API:
> swim-cli sync -h <hostUri> -n <nodeUri> -l <laneUri>
GETrequest is a one-time subscription. It is exactly similar to an HTTP
GETdata from a Swim API:
> swim-cli get
-h <hostUri> -n <nodeUri> -l <laneUri>
Try a Real-world Example
Using this pattern, Swim APIs can handle real-time streaming, batch, and historical data in a variety of formats. Furthermore, Swim vastly simplifies doing aggregations and transformations across streams of data. Swim has even open sourced an example application which transforms the publicly available NextBus API into stateful data streams reporting about transit agencies in the United States. The example app is open source and available on GitHub here.
Using the Swim transit app, here are some examples of APIs you can subscribe to:
- To get a real-time stream for all vehicles in Northern California (aggregation):
swim-cli sync -h ws://transit.swim.services -n /state/US/N-CA -l vehicles
- To get a real-time stream for all vehicles that belong to the SF-muni agency:
swim-cli sync -h ws://transit.swim.services -n /agency/US/N-CA/sf-muni -l vehicles
- To get data for all vehicles in Northern California:
swim-cli get -h ws://transit.swim.services -n /state/US/N-CA -l vehicles
- To get data for all vehicles in Northern California:
swim-cli get -h ws://transit.swim.services -n /agency/US/N-CA/sf-muni -l vehicles