Real Time Observability for Kubernetes with SwimOS
by Krishnan Subramanian, on Aug 30, 2019 1:40:30 PM
The world of enterprise IT infrastructure is undergoing a paradigm shift and becoming more distributed with IT losing their tight hold on the infrastructure. With containers, microservices architectures and serverless functions, the needs of modern day applications are changing. As the container adoption in the enterprise increases, they also face increasing pressure on ensuring business continuity while supporting the modern infrastructure trends. Kubernetes has emerged as the defacto standard for container orchestration and every enterprise is facing the prospect of managing multiple Kubernetes clusters in their environment. With the changes in the underlying infrastructure and application architectures, there is a cultural change that is forcing enterprises to realign their IT to collaborate closely with their developers. Software engineering teams are increasingly responsible for the entire lifecycle of the applications. Traditional tools like monitoring and logging are replaced by newer approaches to infrastructure resiliency like real-time observability. In this post, we will talk about how SwimOS can handle the real-time observability needs for Kubernetes environments.
The need for Observability
With the proliferation of containers, the infrastructure has become even more complex and there are more components to handle than traditional virtual machines. This leads to unpredictability and can result in partial failures and even grey failures. With such systems, we need to embrace resiliency over reliability. It is important to accept failures and figure out a way to reduce MTTR (Mean Time To Repair) than MTBF (Mean Time Before Failure). This rethink requires a newer approach to logging and monitoring. The traditional ways of doing logging and monitoring helped organizations to do postmortem on failures but managing containerized environments requires a proactive approach to limiting failures. We need to look beyond the traditional logging and monitoring tools.
Observability is not about collecting logs, monitoring data and tracing. It is about building systems that can be
- Tested in realistic environments (testing in production) and tested for potential failures (Chaos Monkey)
- The production environments can be deployed in an incremental manner so that it can be either rolled back or rolled forward (GitOps) to ensure resiliency needed for business continuity
In short, observable systems collects enough data necessary so that the system can be understood (in spite of the complexity), debugged in real time and evolved to continue ensuring the resiliency. Real-time Observability becomes an important toolkit for software engineering teams.
SwimOS for Kubernetes Observability
Real-time Observability requires logs along with monitoring and tracing data to be analyzed in real time (say, within a few milliseconds). With Kubernetes environments, you could have cluster level logging outside of Pods so they survive the ephemeral nature of these pods. Such an architecture could not only be useful for doing postmortem but also for real-time observability,as streaming logs from the applications (or microservices) running on the Pods can be observed in real-time along with monitoring and tracing data. The key thing here is to collect different streams of logging and monitoring data, compose it into one unified stream and making it actionable.
SwimOS is designed to handle this need by helping you aggregate multiple data streams into one so you can run analytics or set up rules or heuristics based actions. SwimOS also gives flexibility in how you aggregate and compose streams. You could apply on the fly rules to filter the data so that you can cut down the noise and collect the critical data that can help take manual or automated actions to limit failures. SwimOS provides a single streaming API for all your observability data from your Kubernetes clusters for you to either run analytics or build necessary automation to meet your SLA needs.
Setting up SwimOS as Kubernetes observability platform is pretty straightforward. The web agent to send logs and monitoring streams to SwimOS can run in the same Pod as the user applications. The SwimOS application can run outside the Kubernetes environment. The web agents will stream the data to the SwimOS application which then provides the single API on which your analytics dashboards and automation can operate on. This not only protects the logs from Pod reboots and cluster failures but it also allows you to get the real-time insights to prevent catastrophic failures. Though we do not recommend it, you can also run the SwimOS application inside the Kubernetes cluster if you want to manage both your applications and SwimOS using Kubernetes.
Conclusion
The key to ensuring resilient containerized environments under Kubernetes lies in real-time observability and additional automation to limit failures (AIOps). The key to making Kubernetes clusters observable and ensure proactive action lies in collecting all the data in one place and apply the necessary filters to gain the critical insights that can lead to actions that prevent failures. SwimOS, thus, plays a very important role in enabling Observability for Kubernetes environments. Getting started with SwimOS is easy. It is an open source software and you can download it from the Github repository here.
Learn More
Let us know what you're building using the open source swimOS platform. You can get started with swimOS here and make sure to STAR us on GitHub.