Shifting Your Mindset From Big Data to Intelligent Edge Apps

by Brad Johnson, on Oct 16, 2019 11:03:30 AM


In my most recent post, I made four recommendations about planning for an intelligent edge application. My primary takeaway was that building real-time applications requires a different mindset than building traditional big data applications. Big data applications treat data as a static resource. The focus is on breaking data streams into manageable chunks and then transporting those chunks into a database. Any subsequent data processing and “near real-time” automation becomes a separate and isolated application challenge. 

But when it comes to real-time use cases like process optimization, predictive maintenance, smart cities, autonomous vehicles, and AR/VR, it’s no longer safe to assume “store first, then process” is an appropriate data strategy. Any use case with a need for local real-time data analysis should look to edge computing and AI/ML techniques to optimize data flows and minimize application latency. In other words, you should have an intelligent edge strategy

Here are four considerations to help you determine a successful intelligent edge strategy:

  • Match your architecture to your use case
    Your particular use case should be the single biggest determinant for choosing your application architecture. Choosing an appropriate architecture helps ensure efficient operation and simplifies IT/OT management. For applications with many real-time data sources, edge computing can provide a way to efficiently process and analyze data streams without saturating costly network and cloud resources.

    By choosing an established cloud vendor like Microsoft Azure or AWS that also offers an edge computing solution (IoT Edge and Greengrass, respectively), you can ensure seamless deployment across edge and cloud environments. When paired with an open source platform like swimOS or enterprise-ready DataFabric, which creates a stateful, real-time mesh of distributed application services, it’s possible to create efficient, reliable edge-cloud applications.

  • Understand the challenges of using real-time data
    One of the biggest challenges that we experience when working with new clients is overcoming a lack of data-readiness. Typically, when building proof-of-concept applications for clients, the first thing we receive is a CSV, JSON or XML file of sample data. Rarely is a customer able to present us with access to their real-time data streams.

    That’s because the traditional paradigm our customers were accustomed to involved first transporting edge data to the cloud and storing it in a historian database. Only then would they analyze anything. So they were used to building apps that were constantly sharing CSV, JSON and XML files. But if you need local insights quickly, it’s more efficient to process data as it streams (at the edge and before storing it). That means using streaming APIs, stateful processing and edge computing techniques.

  • Choose an appropriate state model
    Most enterprise applications today are built around a stateless RESTful model. For these applications, state is maintained by a centralized datastore and external services query to establish state context before any analysis or automation occurs. But for real-time use cases, stateless architectures are too slow to keep up with streaming edge and IoT data sources.

    In these cases, it makes sense to consider stateful application architectures. Stateful applications store relevant state locally, so it’s always accessible as data streams. There are stateful processors available today, such as Apache Flink and Spark Streaming, but these are only stateful in isolation. SwimOS and DataFabric take the notion of statefulness a step further by providing a universal state model based on eventual consistency, automatically ensuring distributed applications are always in-sync.

  • Optimize for data locality
    When planning for the edge, you should be conscious about where you need different processing tasks to occur. Optimizing for data locality means ensuring data is processed locally, as close to the source as possible. This means data can be transformed and irrelevant data can be discarded prior to consuming downstream resources. By optimizing for data locality, you reduce IT/OT expenses and fewer network and storage resources are required to operate an application.

    Furthermore, edge computing means encrypting data sooner in your data pipeline and more granular security monitoring. By using a Web Agent architecture like swimOS, intelligent edge agents can actively monitor for adherence to security policies like GDPR. Because data is computed locally, it can be configured to remain on a device, in a particular factory, or within a country's borders while still making it accessible for the rest of the application. Furthermore, because each Web Agent is an isolated application, if a device has been compromised, local automation can quarantine the threat, as the system continues to operate unaffected.

Learn More

Let us know what you're building using the open source swimOS platform. You can get started with swimOS here and make sure to STAR us on GitHub. You can also learn more about our enterprise product DataFabric here.

Topics:Machine LearningStateful ApplicationsEdge AnalyticsManufacturingAsset TrackingAsset ManagementDigital TwinSWIM AIEdge Computingdistributed computingserverlessweb applicationsswimOSRESTstreamingweb agentsstreaming apiopen source softwaremicroservicesdatafabriccloud data analyticsapache sparkdecentralizeduse cases