"The Edge" is Not a Place
by Simon Crosby, on Feb 21, 2019 6:30:00 AM
Today’s architects and developers are comfortable with powerful cloud infrastructure service abstractions that help them quickly build applications. The (REST API + stateless microservice + database) model for cloud apps works remarkably well - from Netflix to Office 365.
Though many cloud apps have complex architectures (eg: here’s Drupal in AWS), there is a beautiful simplicity when compared to IT-oriented app architectures of a decade ago: You don’t need to worry about what devices a service runs on or even where it runs. Your app consumes scalable infrastructure services via REST APIs. Add orchestration to deal with dynamic needs like service scaling or replication, and you’re set. Cloud computing, therefore, is an established set of abstractions that make it easier for application developers to develop quickly, deploy fast, and reliably scale their application with the infrastructure hassles taken care of.
Contrast this with a typical discussion about “edge computing”. You will likely quickly get mired in physical details, such as whether a service (eg: data reduction or inference) runs “on an edge gateway” or “in the fog”. The problem is that we associate the term “edge” with a place-centric, physical world view, without focusing on what makes edge computing different from cloud computing, and how to meet those needs a different set of infrastructure abstractions.
The edge is not a place. It’s a way of computing.
Every org I’ve met that has “an edge computing problem” has tried to use familiar stacks from the cloud to build their solution - and come unstuck as a result. Indeed, most edge vendors are touting solutions that are sure to fail at scale – either because of cost or complexity. Naively assuming the that abstractions that are so powerful in the cloud will also work for the edge ignores fundamental differences between the infrastructure they need and misses an opportunity to simplify and accelerate edge application development - and cut costs by making better use of edge resources.
Edge computing is about combining spatial, time, and application context to discover relevant insights. Edge isn't about "where," edge is about "why."
Cloud computing builds on a powerful triumvirate: REST APIs, stateless micro-services and stateful databases. Edge applications are fundamentally distributed and process streaming data on-the-fly. They need different infrastructure abstractions. Here are some key reasons why:
- The edge is stateful: The REST stateless model serves the cloud well because it lets any server (including “serverless” Lambda / Functions) process an event. For each new event, the app loads the previous state from a database, computes a new value, and stores the new state back in the database. This is simple, but problematic when dealing with real world state changes - where “things” change state a lot. Calling a REST API for each event wastes billions of CPU cycles on an edge CPU - per event.
- Big data is a bad idea: Things in the physical world change fast. Saving a myriad state changes in a big-data store for post processing may be technically feasible, but it will likely be expensive and of low value: Most raw data is only ephemerally useful. Moreover, putting a data store of any type on the “hot path” of a noisy edge application is a terrible idea because performance is limited to the response time of the database. As databases scale and applications become distributed, challenges with replication and synchronization crawl out of the woodwork. Finally, database centric computing is always about past data, whereas edge computing is necessarily about present data. Sure, save whatever you want. Just don’t put it on the “hot path” of the application.
- I need to know now! Edge applications need to always know how to respond. They can’t wait for the next batch calculation, and insights from the last time the analysis was run just won’t cut it. For example: Swim predicts the future behavior of traffic at every intersection in a city (minutes ahead) to within a few hundred ms so that vehicle routing algorithms can dynamically adjust. (Here’s real-time state and predictions for Palo Alto CA. You can download the source code here.)
- Context counts: Data streams aren’t sourced in isolation. Real-world containment, proximity and adjacency of things are crucial to aid understanding a dynamic world. So, edge computation is necessarily “in (a dynamically changing) data graph” rather than “over (a pre-built) graph”.
This list is far from complete. My aim is simply to illustrate the need for a new edge computing architecture. In my next post I’ll start to present an architecture that meets these needs – and motivate the development of Swim.
Swim is open source software that offers everything you need to build massively scalable, distributed, edge applications that stream their insights in real-time.
Developers that want to learn more about the open source Swim platform can get started here.