Kafka or Pulsar? A Battle of the Giants Concerning Streaming

Got Apache Pulsar? It’s an open-source cloud-native messaging and stream-processing platform.

If this sounds suspiciously similar to one or more separate Apache projects – Kafka, for example – it should. Kafka, too, is an open source messaging and stream-processing platform. And several other Apache projects fit this bill, too, including Apex, Beam, Flink, Flume, Nifi, and Storm, among others.

Missing from (or added retrospectively to) the descriptions of these projects is the term “cloud native.”

Pulsar’s project page describes it as a “cloud-native” messaging and stream-processing platform. Cloud-native applications are distributed and loosely coupled. The goal of cloud-native design is to minimize dependencies between programs, which (in developer-speak) are instantiated as “services.”

Fine. But what does this mean and why does it matter? If you are a business or IT decision-maker, why should Pulsar’s (wait for it) “cloud-nativity” matter to you? What’s the business value of Apache Pulsar?

A recent episode of DM Radio, a weekly, data management-themed radio show and podcast, explored these questions. “You want to be cloud-native now, because you know, whether you’re running Kubernetes, or in any of these cloud [environments], you’ve got to keep things separate so you can scale them independently. And Pulsar is unique in that the part that interfaces with all the clients is different from the part that does storage, so you can independently scale those [different components],” Tim Spann, a developer advocate with StreamNative, told DM Radio host Eric Kavanagh.

To describe Pulsar as “cloud-native” is to say next to nothing about its business value. To get a better sense for that, it is necessary to explore what “cloud-native” software is and why it is potentially useful.

Why cloud-native? Because it’s loosely coupled, baby

The simple answer is that cloud-native design is a way of adapting software to take advantage of the unique benefits and features, as well as to work around the constraints, of cloud infrastructure.

But we are getting ahead of ourselves. For now, let’s just say that the business value of cloud-native design is that it produces software that behaves more like businesses expect software to behave.

So, for example, cloud-native software consumes only the resources available to it; businesses pay for only these resources. Organizations can quickly reconfigure cloud-native software to address changing conditions. Because they are scaling relatively granular software functions – instead of hardware resources – organizations can deftly adjust the performance of critical business and IT processes.

And it all starts with loose coupling.

Cloud-native software is loosely coupled in the sense that it consists of programs (“services”) that are not co-dependent on one another. Building on the logic of decomposition, a loosely coupled design takes the basic functions that would otherwise be fused together to form a monolithic application and breaks them down into discrete services. These services communicate with one another via APIs. They are “distributed” in that they do not need to run in the same context. So, for example, the loosely coupled services that comprise a cloud-native credit-processing application may invoke and exchange data with Pulsar instances running in (and managed by) AWS Elastic Kubernetes Service (EKS), a platform-as-a-service (PaaS) data warehouse such as Snowflake, and several generic AWS services.

Lastly, loosely coupled services maintain their states independently of one another. In theory, this makes cloud-native apps resistant to (and resilient in the face of) failure. Thanks to loose coupling, a cloud-native app can tolerate failure in one or more of its services and recover – without crashing. The cloud-native app may lose data specific to one or more “jobs,” but it should not crash or go offline.

Loose coupling has other, related benefits, too. For example, instead of scaling an app by provisioning extra virtual servers – each with its fixed complement of CPU and storage resources – operations personnel can scale a loosely coupled cloud-native app by adding or subtracting instances of its core program functions: e.g., more (or less) workers to score credit applications, more (or less) workers to run credit checks, more (or less) workers to onboard approved applications, and so on, and so forth.

The business benefits of cloud-native software: Observability and manipulability

Think of it this way: Cloud-native design shifts the emphasis (of design, scaling, maintenance, and health/performance management) from hardware and software to business services.

Software engineers are not just building a software application that (for example) processes, scores, approves, and expedites credit applications; instead, they are designing a business service that is used to score, approve, and expedite – i.e., to process – credit applications. Concomitant with this, they are building different types of instrumentation logic into the programs that comprise this business service. This logic permits site reliability engineers and/or similar experts to observe the health of the programs that constitute the business service, along with, crucially, that of the business service itself.

The two-fold vision is, first, to build resiliency into software, such that loosely coupled services can be started, stopped, paused, or restarted as needed. By “services,” we mean the discrete programs that correspond to a cloud-native app’s constitutive functions. This makes it possible to scale cloud-native apps by adding or subtracting instances of services. Second, and concomitant with this, cloud-native design aims to make business services observable – i.e., susceptible to fine-grained control and manipulation – by humans and machines alike. You are not scaling servers, storage, and network capacity; you are, in effect, adjusting sliders that permit you to manipulate the behavior of the service.

Human beings can do this, manually … but so can machines – automatically, in accordance with predefined rules. As I write in a separate piece (for a different venue) that has not yet been published:

Observability instrumentation makes it easier for operations personnel to provision extra resources in response to an observed service impairment – or, if necessary, to redirect shoppers to resources that engineers have provisioned in a separate cloud region or data/co-location center. True, it is possible to perform these tasks today, but the logic of observability aims to create higher-level abstractions – e.g., an observable customer-onboarding workflow or an observable ecommerce virtual catalog service – that are susceptible to control….

Observability instrumentation … likewise permit IT to anticipate and proactively respond to other potential problems. Lastly, it gives [IT and business] decision makers better insight into the impact of service impairment or disruption on operations, revenues, etc.

Apache Pulsar as cloud-native messaging and stream-processing bus

This is where Pulsar’s proponents claim it has the edge vis-à-vis Kafka.

First, they say, it provides a dependable, scalable communications system – a bus – that supports messaging and data exchange between producers and consumers in a cloud-native software architecture. Second, they claim, Pulsar is a purer cloud-native player than Kafka. Compared to Kafka’s, Pulsar’s architecture is more loosely coupled: that is, its core functions are decomposed and instantiated as discrete services. (For more on this, check out my explication of the cloud-native data warehouse here.) So, for example, Pulsar implements its own broker service, but depends on two existing Apache projects – BookKeeper and ZooKeeper – to store content and metadata, respectively.

Third, they argue, Pulsar’s lower latency likewise gives it an edge for time-sensitive real- and right-time use cases. In a certain sense, message/data exchange in the context of cloud-native software design is a great example of a real-/right-time-dependent use case. The lower the latency at which message and data exchange occurs among cloud-native services, the more responsive and stable are the composite applications (and business services) that developers can assemble using these services.

So, data and message exchange in cloud-native apps is one good real-/right-time example.

Spann’s colleague David Kjerrumgaard, a developer advocate with StreamNative and author of the book Pulsar in Action, offered a few others. “In the IoT space, we have large volumes of data and you need to act in real-time or as close as you can get to the physical hardware. A really good use case … [is] preventative maintenance … in an oil and gas field where you’re trying to identify, ‘I’m drilling down to the ground – is that drawbar going to explode or something going to blow up down there?’ You need to stop it in real-time and you need to have that data and act on it very quickly,” he said.

“Or just the volume of data coming in you know from the connected car is another good example,” Kjerrumgaard continued. “We already have so much data coming in off our vehicles and once we’ve got self-driving cars … my smart car will talk to your smart car and let it know, ‘Hey, I’m going to be hitting the brakes here in a second. Okay, let’s tell my car to stop there because of that.”

So what’s it going to be then, eh? Kafka or Pulsar?

The reality is that most customers, large customers, especially, will implement both Kafka and Pulsar.

Both are regularly used with other open-source stream-processing technologies, such as Flink, too.

To me, their value propositions seem quite distinct. Today, customers use Kafka primarily as a heavy-duty stream-processing backbone: a kind of high-throughput “pull” data-movement service that (for most use cases) is capable of delivering more data, faster, to more and different types of consumers, than batch or micro-batch ETL alternatives. Some, many, will also make use of Kafka APIs, such as Kafka Streams, to engineer (i.e., perform operations on) this data as it transits the Kafka bus.

Similarly, customers use Pulsar as a reliable communications substrate – a bus – to support low-latency message and data exchange for cloud-native apps. This is one example of a right-time-dependent use case; Pulsar is a strong candidate for use in many real-time-dependent use cases, too. (Bear in mind, however, that the number of such use cases is actually relatively small. In my experience, when people say “real-time,” what they actually mean is “as soon as is practicable.” You know: right-time.) As a messaging and stream-processing bus, Pulsar is useful to organizations that are building latency-sensitive application workflows and data pipelines. Pulsar’s loosely coupled design also gives organizations more (i.e., different, not necessarily better) options for deploying and scaling it.

Again, I see Kafka and Pulsar as two quite distinct offerings with two quite distinct sets of use cases.

That vendors, influencers, analysts, &c. tend to conflate them in their marketing is frustrating, but understandable. Consider, for example, that StreamNative positions itself as the Confluent of Pulsar: i.e., it sees its relationship to Pulsar as analogous to Confluent’s relationship to Kafka. Its employees, too, play lead roles in committing to and maintaining (i.e., directing, evolving) Pulsar development. But just as Confluent is inclined to promote Kafka as a solution for any and every messaging, stream-processing, data integration, etc. use case, so, too, is StreamNative inclined to promote Pulsar.

Redis’ Raja Rao got it right when – commenting on the uncritical use of Java Web Tokens, especially in apps for which they are dangerous – he observed: “Sometimes, people take technologies that are intended to solve a narrow problem and start applying them broadly. The problem may appear similar, but utilizing unique technologies to solve general issues could create unanticipated consequences.”

Thanks to Mark Madsen for alerting me to Rao’s post. I would not have known about it otherwise.

[i] To update Plato’s gigantomakhia peri tēs ousias – that is, “a battle of the giants concerning being” – for the streaming set.

About Stephen Swoyer

Stephen Swoyer is a technology writer with more than 25 years of experience. His writing has focused on data engineering, data warehousing, and analytics for almost two decades. He also enjoys writing about software development and software architecture – or about technology architecture of any kind, for that matter. He remains fascinated by the people and process issues that combine to confound the best-of-all-possible-worlds expectations of product designers, marketing people, and even many technologists. Swoyer is a recovering philosopher, with an abiding focus on ethics, philosophy of science, and the history of ideas. He venerates Miles Davis’ Agharta as one of the twentieth century’s greatest masterworks, believes that the first Return to Forever album belongs on every turntable platter everywhere, and insists that Sweetheart of the Rodeo is the best damn record the Byrds ever cut.

Kafka or Pulsar? A Battle of the Giants Concerning Streaming

About Stephen Swoyer

Related Articles: