Get your swimsuit on. If we drink the Kool-Aid at Current 2022 with Confluent (spoiler alert, we kind of have) then we will accept the stream-centric proposition put forward by company CEO and co-founder Jay Kreps.
“This is as big of a shift as the invention of electricity or [moving forward to post-millennial times] the birth of mobile ubiquity, cloud and the new age of automation with AI and machine learning,” said Kreps, in his keynote address delivered in Austin, Texas this October 2022.
“We’ve seen this type of growth [as in data streaming] before, where exponential development and scaled expansion looks like a flat line… until it’s not and we really start getting to the hockey stick [upward] part of the curve,” said Kreps.
Beyond basic batch
The Confluent team are predictably upbeat on data streaming, so much so that they often refer to it as streaming data – not that the discipline, principle or practice actually needs renaming. They insist that moving from batch data to data streaming platform progression is a big deal.
As something of an aide memoir for those that may need it, Confluent is a full-scale data streaming platform that enables users to access, store and manage data as continuous, real-time streams. Built by the original creators of Apache Kafka, Confluent expands the benefits of Kafka with enterprise-grade features, while removing the software developer burden of Kafka management or monitoring.
Kafka is written in Scala and Java and can be essentially classed as a message broker system that facilitates the asynchronous data exchange between processes, applications and servers. Kafka has a very low overhead because it does not track consumer behaviour or delete messages that have been read.
Confluent, Inc. itself used its post-pandemic In Real Life (IRL) event in Texas this fall/autumn to announce Stream Designer, a visual interface that enables developers to build and deploy streaming data pipelines in minutes.
Democratizing data streams
This is point-and-click visual builder that bids to ‘democratizing data streams’ so they are accessible to developers beyond specialized Apache Kafka experts. So then, not quite the low-code no-code democratization spin that we hear so much of designed to convince us that ‘everyone’ can code, but a more considered (narrower) version of technology democracy designed to make data streaming more accessible throughout the software engineering cognoscenti.
The theory here is, with more teams able to rapidly build and iterate on streaming pipelines, organizations can quickly connect more data throughout their business for agile development and better, faster, in-the-moment decision making.
“We are in the middle of a major technological shift, where data streaming is making real-time the new normal, enabling new business models, better customer experiences, and more efficient operations,” said Kreps. “With Stream Designer, we want to democratize this movement towards data streaming and make real-time the default for all data flow in an organization.”
Confluent’s spokespeople and partners echo the same message i.e. data ‘has’ actually existed for a while, but up until recent times it has been a piecemeal process, coded together in bits (as in small pieces, not as in bits and bytes) via workaround software application development systems and processes that were neither dedicated, platform-based or custom-aligned for data streaming purposes.
But, all that said and done, times have changed. Kreps and team suggest that the streaming technologies that were once at the edges (of the business – and perhaps at the computing ‘edge’ in the Internet of Things too) have become core to critical business functions.
The home truth here comes down to the fact that traditional batch processing can no longer keep pace with the growing number of use cases that depend on sub-millisecond updates across an ever-expansive set of data sources.
Kafka, now de facto?
Organizations are clearly seeking ways to accelerate their data streaming initiatives as more of their business is operating in real-time. Confluent goes so far as to suggest that Kafka is the de facto standard for data streaming. It’s an assertion that may well be validated and justified, Kafka today enables over 80% of Fortune 100 companies to handle large volumes and varieties of data in real-time.
However, building streaming data pipelines on open source Kafka requires large teams of highly specialized engineering talent and time-consuming development spread across multiple tools. This puts pervasive data streaming out of reach for many organizations and leaves existing legacy pipelines clogged with stale and outdated data.
Amy Machado, research manager, streaming data pipeline division IDC says that businesses need to add more streaming use cases, but the lack of developer talent and increasing technical debt stand in the way. She suggests that visual interfaces, like Confluent Stream Designer, are key advancements to overcoming these challenges and make it easier to develop data pipelines for existing teams and the next generation of developers.
Flexible point-and-click canvas
Stream Designer provides developers with a flexible point-and-click canvas to build pipelines much more quickly than at any time in the past (in minutes in fact) and describe data flows and business logic easily within the Confluent Cloud UI. It takes a developer-centric approach, where users with different skills and needs can seamlessly switch between the UI, a code editor and a command line interface to declaratively build data flow logic at top speed.
It brings developer-oriented practices to pipelines, making it easier for developers new to Kafka to scale data streaming projects faster.
The promise here is that with Stream Designer, developers and associated software engineers can work considerably faster. Instead of spending days or months managing individual components on open source Kafka, developers can build pipelines with the complete Kafka ecosystem accessible in one visual interface.
According to Confluent, developers using Stream Designer can build,iterate and test before deploying into production in a modular fashion, keeping with popular Agile (as in agile and as in Agile CAPS A) development methodologies. Why is that so? Because there’s no longer a need to work across multiple discrete components, like Kafka Streams and Kafka Connect, that each require their own boilerplate code.
After building a pipeline, the next challenge is maintaining and updating it over its lifecycle as business requirements change and tech stacks evolve. Stream Designer provides a unified, end-to-end view to observe, edit and manage pipelines and keep them up to date.
“Pipelines built on Stream Designer can be exported as SQL source code for sharing with other teams, deploying to another environment, or fitting into existing CI/CD workflows. Stream Designer allows multiple users to edit and work on the same pipeline live, enabling seamless collaboration and knowledge transfer,” said Confluent, in a press statement.
Swim in the data stream
Among the partners working to most vocal at this year’s Confluent Current 2022 is Swim. Describing itself as creators of the first open core platform for building, managing and operating streaming applications at scale. Founder and CTO of Swim is Chris Sachs.
Sachs used his time at Current 2022 to talk about a process that he called ‘streaming data to humans’. The billing for this session posed the following questions. What role do human operators play in an increasingly automated world? How can mere mortals oversee – and make sense of – what large scale real-time systems are doing right now? Is it possible to get an intuitive high level sense of the real-time state of a hundred million things? Is it useful to do so?
Swim explored these related points and asked how users can explore real-time data with the same specificity they’re accustomed to having for events in the past? The analysis here demonstrated real-world attempts at answering the above questions using full stack, end-to-end streaming data applications, fed by Kafka.
Full stack streaming applications that push data all the way through to end-users, with comprehensive application logic in the real-time loop, take Kafka topics to the next level. Swim shared lessons learned (and mistakes to avoid) gleaned from a decade’s experience striving to make cross-domain streaming data accessible to people, so that they may feel more comfortable with (and better in charge of) the global automations they most wish to pursue.
Sachs and team also looked at use of HTTP/2 streaming API for full stack real-time applications.
Despite being a multiplexed streaming protocol, HTTP/2 is still primarily used to make one-shot remote procedure calls to stateless web services. Swim used this session to explore how to upgrade REST APIs to provide granular streams of real-time state changes, driven by Kafka events.
“Creating streaming APIs from Kafka topics enables web browsers and other API clients, to observe real-time changes to individual entities, without having to consume whole topics. HTTP/2 multiplexing enables applications to dynamically subscribe to the real-time state of many entities at once, over a single connection,” noted Swim.
Large-scale data-intensive streaming
After covering the basics, Swim compared and contrasted streaming APIs with REST APIs, using a simple command line client to illustrate. The presentation also dove deeper into design patterns and best practices for incorporating streaming APIs into large-scale, data-intensive streaming applications.
The company demonstrated real-time maps that dynamically stream the live state of thousands of real-world entities, while only streaming what’s actually visible on screen at any given time. It closed with a whirlwind tour of UX design patterns that showcase how streaming APIs can create live windows into our worlds – both real and virtual.
The key takeaway from Confluent (and indeed Swim for that matter) is that the age of batch data is over (or so it is claimed, clearly there will be many instances where batch jobs live on for many years to come) and that the era of real time streaming data is here.
About Adrian Bridgwater
Adrian Bridgwater is a freelance journalist and corporate content creation specialist focusing on cross platform software application development as well as all related aspects software engineering, project management and technology as a whole. Adrian is a regular writer and blogger with Computer Weekly and others covering the application development landscape to detail the movers, shakers and start-ups that make the industry the vibrant place that it is. His journalistic creed is to bring forward-thinking, impartial, technology editorial to a professional (and hobbyist) software audience around the world. His mission is to objectively inform, educate and challenge - and through this champion better coding capabilities and ultimately better software engineering.