Stream Processing and Real-Time Analytics

With the industry on the cusp of what proponents describe as a revolution in real-time analytics, businesses must balance known risks against potential rewards as they assess the benefits of technology patterns such as stream-processing. For most businesses, the transformative possibilities of real-time everything should prove to be compelling – in spite of the technology learning curve.

“The ability to make decisions in real-time or right-time has always been a bit of a luxury to most companies,” said Shawn Rogers, vice president of analytics strategy with TIBCO, on a recent episode of DMRadio. “And I can remember … having conversations about how important streaming data and streaming decisions can be to a firm. And … depending on who you were speaking to, you would get sort of … stiff armed on the subject. [People would say] ‘It would be nice if we could do that, but we’re a decade away from being able to leverage streaming analytics, streaming decisioning.’”

Today, real-time analysis is not merely possible but surprisingly affordable, Rogers argued, citing Apache Kafka, one of the most popular open-source stream-processing frameworks.

“The ability to apply an analytic model to streaming messages or streaming data of just about any type … [gives an organization] the ability to react in a [period of] time where the value lives,” Rogers contended. “As time is applied to the ability to act on information, often the value disappears. These types of technologies are making it easier to afford [and] easier to experiment with [streaming analytics]. Frankly, it’s powering innovation across most of the companies that I see.”

Data access, movement, integration, etc. reimagined

In point of fact, real-time analytics is just one (especially compelling) use case for stream-processing.

Proponents argue that stream-processing has the potential to transform business processes and business operations, not only by supplying business processes with real-time data, but by permitting new types of event-driven actions (or sequences of actions) that span multiple (internal and external) processes. Today, most automated operations depend on event triggers to kick off; in many cases, however, the event data[i] that triggers actions is not refreshed in real-time. If a task is not especially time sensitive, it might take just a few minutes, or an hour, or sometimes even a whole day for an action to trigger. Sometimes it’s easier to assign a human being to monitor certain tasks in order to detect and intervene when delay occurs. This latency is a function of the traditional, batch-oriented data integration paradigm that is still regnant in most organizations. Innovation in the context of batch processing has focused on reducing latency intervals, as with (for example) micro-batch processing.

Stream-processing explodes this paradigm. It is predicated on the idea that intervention – action – is most valuable if it can occur as a more or less immediate response to an event. (This is the inescapable logic at the heart of high-frequency trading.) Latency is still baked into the world of stream processing, but it is typically measured in terms of milliseconds, as distinct to seconds, minutes, hours, or even days. This right-time latency permits businesses to design more tightly knit processes, as well as to automate and orchestrate fine-grained, time-sensitive interactions that span business processes – inclusive, notionally, of external processes, too. Streams can capture data not just from signalers at the enterprise edge (sensors, telemetry devices), and not just from applications, systems, and services, but from databases and file systems, too. Similarly, streams can support consumers of all kinds – be they machines (applications, services, etc.) or human beings (data scientists, business analysts, etc.).

So what’s not to like? Well, remember that old saw about putting one’s cart before one’s horse?[ii]

Unfortunately, success with stream-processing is by no means a slam dunk, stressed Neil Barton, CTO with start-up Clean Data Inc. In the first place, Barton observed, some businesses – especially, small and medium-sized enterprises – need help identifying and formalizing business-focused stream-processing use cases. This is a far-from-trivial task. In the second place, and from the point of view of many of the people who work in enterprise IT, stream-processing is still a relatively new technology paradigm: i.e., a new way of thinking about data access, movement, and engineering. Not only does it involve new (sometimes strange) concepts, but new and unfamiliar tools, too. The upshot, Barton notes, is that companies need help selecting and implementing a stream-processing stack.

“It [is] challenging for … IT staffs to understand how do I bring in the necessary technologies, what technologies do I choose?” he pointed out. “How do I fit [this] into my ecosystem?”

Lastly, the most aggressive adopters are thinking about integrating stream-processing into their core data infrastructures – e.g., using Kafka as a backbone for data movement, in-flight data transformations, and data distribution. This last entails a reimagining of data architecture itself.

“Putting in Kakfa as a backbone allows [businesses] to get access to real-time data, it allows them to decouple [data sources from data consumers] which makes it much simpler to manage the evolution of their data infrastructures over time. They don’t have this big monolith,” Barton said. “[Businesses are] starting to lay [Kafka] down as a foundation to bring … streaming data into their ecosystems [along with] some of their batch-based data and then feed that into the data warehouse environment.”

A paradigm shift in the actual sense of the term

Justin Reock, chief evangelist for open source software and API management with Perforce Software, says that stream-processing can entail a steep learning curve and usually requires a significant technology commitment: not only in terms of the selection and implementation of a stream-processing stack, but, more fundamentally, in terms of redesigning software architecture on the basis of cloud-native principles, too. “The world doesn’t produce data in batches and it doesn’t produce data one payload at a time; the world produces data in streams, right? Now, what’s the goal … of a business that wants real-time analytics? What are they really trying to do? They’re trying to build a pattern that’s being referred to more and more as … a digital twin of the organization,” he said, describing a “digital twin” as a richly textured representation of a business that is constructed entirely on the basis of data.

Reock sees stream-processing – and, more fundamentally, the shift to streams (as distinct to batch intervals) as the dominant metaphor for thinking about, dealing with, and distributing data – as bound up with a much more ambitious, albeit still coalescing transformation. The logic of the digital twin – which presupposes the ability to model the world and its events in ways (and at a scale) never before imagined – is an expression of this transformation, which, at its extreme point, aims to model and simulate different types of “ambient experiences” in the world. He used the example of testing a business go-to-market strategy. “Wouldn’t it be great to be able to test … against a digital construct [of the business in] 500 different markets and see which one in the model performs best before … in real life [you] try to run the same routine?” he asked, unpacking the logic of this new thinking.

“You can’t do that the old way,” Reock told DMRadio host Eric Kavanagh. “The net result is that we end up with ambient experiences, [such as] retail-less stores where you just walk in and take something off the shelf and just leave because [in the background] APIs [and] sensors and predictive analytics and imaging … just recognize who you are, what you took, and how you’re going to pay for it. And that’s [enabled by] all of the ambient and reactive technology behind the scenes that you don’t see.”

—————————————————————————————————-

[i] An event is not always a discrete “thing” – that is, a single message generated by an application; a specific string of data – but a sequence of messages / accumulation of data that correlate to a pattern.

[ii] Or, as updated by Schitt’s Creek’s Moira Rose, “Keep the carriage in the wake of the mare.”

About Stephen Swoyer

Stephen Swoyer is a technology writer with more than 25 years of experience. His writing has focused on data engineering, data warehousing, and analytics for almost two decades. He also enjoys writing about software development and software architecture – or about technology architecture of any kind, for that matter. He remains fascinated by the people and process issues that combine to confound the best-of-all-possible-worlds expectations of product designers, marketing people, and even many technologists. Swoyer is a recovering philosopher, with an abiding focus on ethics, philosophy of science, and the history of ideas. He venerates Miles Davis’ Agharta as one of the twentieth century’s greatest masterworks, believes that the first Return to Forever album belongs on every turntable platter everywhere, and insists that Sweetheart of the Rodeo is the best damn record the Byrds ever cut.

Stream Processing and Real-Time Analytics

About Stephen Swoyer

Related Articles: