Why We Need a Distributed Operating System
by Chris Sachs, on Mar 12, 2019 11:10:11 AM
How do you write a simple application to run on a complex, heterogenous mess of devices? You need a distributed operating system," says SWIM.AI founder and Chief Architect, Chris Sachs.
Where does software run? Does it run on a Central Processing Unit—a CPU? That used to be the case. But most modern devices have several CPU cores; and today's applications just get smeared across those cores. This works for developers because they don't really have to manage the "smearing" themselves. Instead of writing software to run on bare metal, developers write applications to run on an intermediate piece of software, called an Operating System. The Operating System manages utilization of the cores, and hides the whole, sloppy mess from applications. It’s beautiful!
Until you want to run a single piece of software across many distributed devices simultaneously—then it’s a nightmare. You have to do all the smearing yourself. And it gets really, really messy. Most software written today must be painstakingly spread across desktop computers, mobile devices, Cloud virtual machines, and increasingly, the so-called Edge of the network. Unlike multi-core software, which runs nice and cleanly on traditional operating systems, multi-device distributed software runs on goopy stacks of middleware. What gives?
How We Got Here
Like much of history, we arrived here by accident. The predominant technology used to create distributed software—HTTP and REST— weren’t designed to support distributed apps at all; they were invented to distribute a global library of documents—the World Wide Web. There is no operating system that transparently smears software across internet-connected devices because the underlying Web technology is fundamentally unfit for the job. The so-called stacks of spaghetti mess middleware holding together today’s Cloud applications all exist simply to overcome this technology mismatch.
Can you imagine a world where, in order to build a mobile app, developers had to first stitch together a file system, memory management system, process scheduler, inter-process communication mechanism, and multithreaded consistency model? In this hypothetical world, would the Apple App Store and Google Play Store have as many apps in them? Would the apps be as capable? Such a world seems unthinkable, right?
Yet to build a distributed application, developers must rig up a database, an application server, and often a message broker, a job manager, and more. The Cloud is that unthinkable dystopia, where apps don’t have operating systems that provide the bare necessities of their existence.
What killer apps might we be foregoing? What sacrifices are we accepting? We put up with apps that are less automated, less collaborative, and less intelligent than they ought to be. But when you consider the fact that databases have no agency, applications servers have no memory, message brokers can’t take feedback, and job managers only run tasks on occasion, it’s no wonder why our apps are so dumb.
The solution is to build a distributed operating system that can uniformly smear applications across heterogeneous networked machines—similar to how traditional operating systems distribute compute across multi-core computers—while letting distributed applications believe that they’re ordinary, general purpose software, that has agency, memory, awareness, and continuity.
What might such a distributed operating system look like? It will build on traditional operating systems, so it won’t have to re-solve low level hardware problems. It will have analogues of file systems, interprocess communication pipes, process schedulers, and I/O system calls. But the nature of these subsystems will be quite different from their traditional OS counterparts. Trying too literally to replicate a traditional OS in a distributed environment is how past attempts at building distributed operating systems have failed.
The key to building an effective distributed operating system is to blend the abstraction of a traditional OS with the universal principles of the World Wide Web.
Other Posts in This Series
- Part 2 examines how the principles of the Web can be fused together with the fundamentals of a traditional OS to create a truly distributed operating system, on which general purpose software can faithfully execute.
- Part 3 introduces Swim and the WARP protocol as a fully functioning implementation of a Web native, distributed operating system.