Making Sense of Sensor Data

What can you do with sensor data? Real-time, event-driven process automation, you say? Sign me up!

But what do you actually need to be able to do this on the tech side? This is the question a lot of small- and medium-sized enterprises (SME) – along with large enterprises, too – are asking themselves.

Unfortunately, they can’t always turn to Amazon, Google, Microsoft, and similar hyperscale cloud providers for simple answers. On a recent episode of DM Radio, a weekly, data management-themed radio program, host Eric Kavanagh aptly compared the array of services that Amazon offers via its Amazon Web Services (AWS) Marketplace to the in-store inventory of a giant home-improvement retailer such as Home Depot. AWS is home to thousands of services, some of which are developed and supported by Amazon, most by third-party providers. Even if the goal is to simplify the process of buying, deploying, and using cloud software, the sheer profusion of choices confuses some customers.

Besides, to ask this question is just to scratch the cloud-services surface, argued guest Christian Lutz, founder and president of Crate.io, which develops CrateDB, a PostgreSQL-based distributed RDBMS platform that it positions for real-time data ingest, real-time analytics, and other demanding use cases.

Lutz says the typical enterprise should start simple, e.g., by identifying specific use cases for real-time analytics and figuring out what resources it needs to support them. In most cases, enterprises focus on process improvements, typically in the form of real-time monitoring of process yields, efficiency, waste, and other metrics. However, he told Kavanagh, the real business value comes via the predictive dimension: “It’s not only about knowing the process … it’s also [about] seeing ahead and trying to predict if there is going to come a problem so you can react now to stay within your process window, for example. Or reduce the cost of poor quality by reducing waste. And you get ahead of the problem.”

You can start simple with this, too, Lutz told Kavanagh. The answer isn’t a Hail-Mary pass – for example, a black-box “solution” for sensor-infused analytics – but a logical, building blocks-type approach: you start with a solid foundation for sensor-infused analytics and you build on top of that.

Simplifying the inherently complex

Fine. But how does one do this? How does an enterprise lay a “solid foundation” for real-time sensor-infused analytics? Once again, Lutz argues, there’s virtue in simplicity: you start with a scalable, reliable data management platform. In his case, of course, Lutz would prefer customers to start with CrateDB.

As he sees it, the virtuous simplicity of a platform such as CrateDB is that it consolidates multiple, nominally separate functions into a single platform. These include real-time ingest and analysis of sensor data; real-time analysis of sensor data contextualized with OLTP data; and analysis of both real-time sensor and OLTP data contextualized with historical data. CrateDB is a multi-model DBMS, too, which means it can query across time-series and relational data models. Lutz said Crate.io offers CrateDB Cloud for AWS and Azure and CrateDB On-Premises for the data center: “Our core idea [is] that we combine this into one system that can infinitely scale and can be set up very efficiently.”

In other words, a platform such as CrateDB permits an enterprise to ingest, manage, and contextualize data from different types of sensors, as well as to enrich it with data from other contexts – such as core OLTP systems, for example – to create richer, more realistic analytic views. All of the data is already there: in a single repository, stored in separate tables or across multiple data models.

In this respect, said Lutz, Crate.DB functions as a kind of central data hub for real-time (sensor), right-time (OLTP), and historical data. Data scientists, machine learning (ML) and artificial intelligence (AI) engineers, data engineers, ETL developers, data modelers, software developers, and other experts have one place to go to get the data they need. Instead of designing, testing, and maintaining the complex pipelines (complete with error-correction and resiliency logic) that are necessary to acquire and engineer data from, and to manage data dependencies across, dispersed data sources, they can query against just CrateDB. This makes it easier to design reliable analytic and ML solutions, cloud services, and service workflows that incorporate real-time, right-time, and historical data in context.

“Data without context doesn’t help, and the context usually comes from an ERP system or another data management system that is typically relational. And the point is that we then have to run these two [typically separate] systems and keep them in sync. And that’s our core idea that we combine this into one system that can essentially infinitely scale and can do the setup very efficiently,” he explained. “Very often you need to take a decision not just on the data on the stream, but you have to look up historical data or you have to reference ERP data … and this can be very time consuming.”

A consolidated platform such as Crate.io conceivably simplifies other questions, too. For example, how does an enterprise get started collecting data at its edge? Relatedly, how does it shift data processing to the edge, such that, instead of shipping terabytes of sensor data from edge to cloud over constricted WAN pipes, it ships only the data it needs? Is this beyond the ken of the average SME?

Lutz’s answer: deploy Crate.io’s new CrateDB Edge service and have it ingest and process data in real-time. That product, now offered via a “controlled” early adopter program, consists of a Kubernetes cluster that’s deployed at the edge or in a cloud hosting region local to the edge use case.

Not a panacea, nor meant to be

No platform is or can be a panacea, and Lutz was not so hubristic as to make this claim. He told Kavanagh that a managed service such as CrateDB Cloud provides a scalable, reliable substrate for designing, deploying, and maintaining event-driven apps and services that are triggered by real-time stimuli, such as data from sensors. It provides a no less scalable, reliable substrate for DevOps, data science, machine learning (ML) and artificial intelligence (AI) engineering, data engineering, and similar practices. Lutz said CrateDB simplifies the nuts-and-bolts tasks involved in configuring, deploying, and maintaining a data management platform; this frees experts to focus on their own value-creating work.

It also consolidates all useful data in a single context and permits experts to design apps, services, pipelines, etc. that query across distinct data models – both relational and time-series, for example.

It doesn’t in any sense automate the jobs of these practitioners, Lutz stressed – although it does make them more straightforward. From the enterprise core to the edge to the public cloud, he said, CrateDB “is a single data hub for all your data, so you can combine [operational data] with sensor data – and I’m talking about lots of sensor data – and together with relational data, and all of that in one database, and you can access this with SQL, which makes it super easy to handle and super easy to implement.”

About Vitaly Chernobyl

Vitaly Chernobyl is a technologist with more than 40 years of experience. Born in Moscow in 1969 to Ukrainian academics, Chernobyl solved his first differential equation when he was 7. By the early-1990s, Chernobyl, then 20, along with his oldest brother, Semyon, had settled in New Rochelle, NY. During this period, he authored a series of now-classic Usenet threads that explored the design of Intel’s then-new i860 RISC microprocessor. In addition to dozens of technical papers, he is the co-author, with Pavel Chichikov, of Eleven Ecstatic Discourses: On Programming Intel’s Revolutionary i860.

About Vitaly Chernobyl

Related Articles: