X

Business

Home Business Big Data

Pulsar graduates to being an Apache top-level project

The Apache Pulsar open-source, distributed messaging system is destined to be used in many real-time and big data programs.

Written by Steven Vaughan-Nichols, Senior Contributing Editor Sept. 27, 2018 at 10:34 a.m. PT

See also

Big data and digital transformation: How one enables the other

Drowning in data is not the same as big data. Here's the true definition of big data and a powerful example of how it's being used to power digital transformation.

In Montreal at ApacheCon, the Apache Software Foundation (ASF) announced that Pulsar had graduated to being an Apache top-level project. This pub-sub messaging system boasts a flexible messaging model and an intuitive client application programming interface (API).

Pulsar is a highly scalable, low-latency messaging platform running on commodity hardware. It provides simple pub-sub and queue semantics over topics, lightweight compute framework, automatic cursor management for subscribers, and cross-datacenter replication. It was designed from day one to address gaps in other open-source messaging systems.

Also: Apache Flink: Does the world need another streaming engine?

The initial goal for Pulsar was to create a multi-tenant scalable messaging system that could serve as a unified platform for a wide set of demanding use cases. Since then, it's the scope has been expanded to add lightweight compute and a connector frameworks. This enables users to process data and integrate with external systems from inside Pulsar. This makes it interesting for both real-time and big data applications.

Pulsar's architecture separates the serving and storage layers by leveraging Apache BookKeeper as the persistent storage component, which has proven to be a key strong point. This two-layer architecture enables Pulsar to offer a simplified approach to the cluster operations. This allows operators to easily expand clusters and replace failed node and provides a much higher write and read availability.

Its other main features include:

Native support for multiple clusters with seamless geo-replication of messages across clusters.
Low publish and end-to-end latency.
Seamless scalability out to over a million topics.
Client API with bindings for Java, Python, and C++.

Does some of that sounds familiar? If you're a programmer, it should. While it's not a duplicate of Apache Kafka, which is usually used for building real-time data pipelines and streaming apps, sometimes Pulsar is better. As Jim Jagielski tweeted, "Apache Pulsar is not only more performant that Apache Kafka, but makes it super easy to increase partition size and/or duration. Been bitten by this a few times :)."

Also: A critical Apache Struts security flaw makes it 'easy' to hack companies

He's not the only one who likes Pulsar as a drop-in Kafka replacement. InfoWorld just awarded Pulsar its 2018 Best of Open Source Software award for databases and data analytics. It wrote, "Pulsar offers the potential of faster throughput and lower latency in many situations, along with a compatible API that allows developers to switch from Kafka to Pulsar with relative ease."

"Launching Pulsar at Yahoo in 2015, our goal has always been to make Pulsar widely used and well-integrated with other large-scale open source software," said Joe Francis, Oath's Director of Storage and Messaging. It looks like he'll see his goal realized.

Related stories:

Editorial standards

Show Comments

Related

collab2-gettyimages-607477465

4 Slack alternatives you need to try: Say hello to open source collaboration

OPEA Diagram

The Linux Foundation and tech giants partner on open-source generative AI enterprise tools

A lecture hall full of students

Nvidia-powered AI supercomputer gives students the same tools as big tech engineers