Rethinking the World Wide Web for Streaming Data

by Chris Sachs, on Mar 14, 2019 10:51:24 AM

This is the second post in a 4 part series about the future of the World Wide Web. Click here to read part 1. Stay tuned, parts 3 and 4 are coming soon!

pexels-photo-433604

"The World Wide Web was designed to transport documents over the Internet. But now the Web supports distributed applications, and that's a problem," says SWIM.AI founder and Chief Architect, Chris Sachs.

In my last post, we explored why a Distributed Operating System is necessary to kickstart the next generation of connected applications. In this post, I wanted to connect the dots between the way we build distributed apps today and the World Wide Web. We’ll extend the principles of the Web to continuous processes, breaking through the Web’s document-centric orientation, while preserving the virtues of universality and decentralization. In Part 3, I’ll describe how swimOS implements a fully functioning, distributed operating system based on these principles. Lastly, in the final post in this series, we’ll examine what widespread adoption of a distributed operating system could mean for the future of Web applications.

World Wide Web of Documents

What is the World Wide Web anyway? We’re all intuitively familiar with it. But even those familiar with technology sometimes conflate the Internet, the Web, and the applications that run on top. Here are my basic definitions for each:

  • The Internet is a globally interconnected network, used to exchange data packets between endpoints.
  • The World Wide Web is a global hypertext library application, used to transport documents over the Internet.

Where do Web Pages live? Does the Wikipedia article about operating systems exist on a Web Server somewhere? That used to be the case. But today, most Web Pages are stored and hashed across clusters of servers. The Web hides much of the complexity of spreading documents across the Internet, in part by giving each web page a universal address, called a URI.

The Web also spreads documents across many networked machines, making it seem as though each document is its own unique thing. Sounds like our definition of an Operating System from part 1, right? An OS spreads software programs across CPU cores, while each program thinks that it has a whole machine to itself. Isn't the Web a distributed operating system, then?

Not quite. The Web is not a distributed OS because documents and processes are different. Unlike ephemeral documents, processes are continuous. This breaks the stateless HTTP model of the Web. OSes run processes. No big deal, you might say; the Web was never intended to be a distributed operating system. But regardless of its original purpose, the dominant use case for the Web today is to run Web Applications. And using this document exchange mechanism as the backbone for distributed applications has created an entire industry’s worth of problems.

Messaging is the Problem

Imagine trying to drive a car by writing letters to the driver. The driver mails you details about what she sees, and you write back with a description of how to turn the steering wheel. That’s is what running distributed applications over the Web is like—in the best case scenario. Usually, the problem is much worse. It’s more like trying to drive a car by having the driver mail letters to a library, asking for information about obstacles, while simultaneously, you repeatedly mail letters to the same library, asking for reports of drivers looking fo for obstacles. For each report you receive, you mail driving instructions back to the library, hoping the driver will request new instructions again before she kills someone.

Okay, so we don’t use the Web to drive cars, for obvious reasons. But these very real problems affect every non-trivial Web Application. Just replace driving a car with hailing an Uber, or catching a Pokėmon, or using AI to predict machine failure, or collaborating on a shared document. Web Applications are constantly sending messages everywhere. That's why message brokers and stream messaging exist. But building these kinds of applications on a digital library model leads to deeply rooted cost, complexity, and capability problems.

The solution  is to upgrade the Web’s HTTP protocol to support massively multiplexed streaming between URIs. The WARP protocol, implemented by Swim, accomplishes this. But the idea of streaming data between URIs begs a deeper question, one that will circle us right back to where we started: in order to have a streaming conversation with a URI, the URI has to be a continuous process that runs somewhere. And what do we call a piece of software that smears continuously running processes across networked machines? A distributed operating system.

Rethinking the World Wide Web for Data Streams

We need a streaming World Wide Web to serve as a more suitable inter-process communication mechanism for the distributed operating systems of the future. The Web’s principles of universal addressability, and decentralization make it an ideal foundation for loosely coupled distributed systems. But the Web’s innate bias towards treating every resource as a document hinders its ability to cope with next generation application patterns. Upgrading the Web for multiplexed streaming remedies this key limitation.

But we haven’t reached the bottom of the rabbit hole just yet. Before we can build our way out of the mess of cobbled together middleware—we need to tackle one final sticking point: statefulness.

Other Posts in This Series
  • Part 1 examines why a Distributed Operating System is necessary to kickstart the next generation of connected applications.
  • Part 3 (coming soon) introduces Swim and the WARP protocol as a fully functioning implementation of a Web native, distributed operating system.
  • Part 4 (coming soon) dives into the compelling kinds of applications that a distributed operating system like Swim enables you to build.
Learn More

Swim is an Open Source platform for building stateful data-driven applications that continually compute and stream their insights in real-time. You can get Swim here.

Topics:middlewarestreamingWARPRESTswimOSweb applicationsHTTPserverlessdistributed computingSwim EnterpriseEdge ComputingSWIM AIDigital TwinEdge AnalyticsMachine LearningSWIM SoftwareStateful Applications