A Simple Swim App for Apache Kafka Event Streams

by Simon Crosby, on Apr 29, 2021 9:00:00 AM


How can you derive continuously useful intelligence from your event streams to help automate business decisions? This blog provides a worked example using streaming events from Apache Kafka.

There are a couple of fundamental requirements:

  1. Applications need to be “in sync” with the real world so their reactions are precise, accurate and useful in the real-world.  I blogged about that here.
  2. Analysis of event data has to be done on-the-fly - before storing anything.

Swim gives you an easy way to do both.  In this post I'll show you how to build applications that analyze, learn and predict on the fly.  To deliver continuously useful insights, apps need to:

  • Always have an answer from the latest data: Algorithms need to analyze, train and predict continuously
    • Each new event must be statefully analyzed in real-time as soon as it arrives
    • As a result, insights and predictions are necessarily “given data thus far”, and the outputs  therefore also form a real-time stream
  • Events are only meaningful in a real-world context: Fluid relationships between data sources - like containment, proximity, or even correlation - are critical for deep insights that result from the joint meaning of events over time. 

Accessing a database to gather state is a million times slower than stateful, in-memory analysis. And though in-memory databases and data grids can help enhance performance, none addresses the need to compute on-the-fly as data arrives, or to continuously find and compute in the context of the event source (eg “within 100m” or “correlated to”).  Dynamic, fluid relationships between sources are crucial for accurate, deep insights.

In Swim each data source is represented by a concurrent, stateful actor - called a Web Agent. Swim uses streaming data to build a scaled-out graph of  Web Agents - one for each source. Each continuously receives events from its real-world source and statefully evolves - like a smart “digital twin”. It continuously processes its own event data and dynamically links to other agents based on their real-world context, dynamically building an in-memory graph whose links indicate relationships like proximity, containment, or correlation.

Linked Web Agents see each other’s state changes in memory, in real-time.  Web Agents continuously analyze, learn and predict using their own state and the states of their linked neighbors in the graph, and stream the resulting insights to UIs, brokers, data lakes, and enterprise applications. 

Developing The App

A Swim application is just an object-oriented Java application.  Here’s a  “hands-on” example application that uses Swim to track satellites in orbit and show them on a real-world map.  Each concurrent satellite Web Agent continuously computes its own trajectory from streaming data. 

It’s easy to envisage far more sophisticated analysis that would be fun to add, for example an unsupervised learn/predict pair that allows satellites to predict their future trajectories from their past behavior.  Swim provides a rich suite of algorithms that you can use to add your own analysis, including models of mathematical and geometric structures, rings, fields, vector modules and spaces, affine spaces, tensor spaces, probability distributions, and associated operators, but there are lots of cool libraries and other projects too.

For publicly available data on the current positions of satellites, spacecraft and space junk, we used the API offered by SpaceTrack.org.  We used NodeJS as a generic Kafka Producer that delivers satellite position event data to a Kafka Broker running inside a Docker instance (that you can even run on your local machine). When you run the Swim application it receives events continuously from Kafka, and creates a WebAgent for each satellite in the data - similar to a “digital twin”.

The Swim server in the example application also listens for http requests, so each Web Agent can render itself on a canvas of the real-world map.  This is done by JavaScript objects that run in-browser and link to the satellite Web Agents to continuously see their positions, and in turn update them continuously on-screen.


Topics:nodejsstateful lambdaskafka