Building modern data pipelines by unifying Apache Pulsar, Heron and BookKeeper


Data pipelines are hard to build and maintain. This is due to complexity of big data open source ecosystem that has numerous software each specializing in solving one piece of the puzzle. In this talk, we will focus on three key open source software Apache Pulsar, Apache Heron and Apache BookKeeper and how are integrated to make it easy to build data pipelines.


For today’s enterprises, ensuring that data pipelines are available to every corner of the organization is key to building next generation data-driven applications. In this talk Karthik Ramasamy of Streamlio will present on how to combine three best of breed open-source projects to have a solid data infrastructure that are is easy to develop against and simple to operate at scale in production.

He will provide an overview of the merits of the three open source systems and the benefits they bring when integrated:

Apache Pulsar: unified queuing and streaming

Apache Heron: stream processing

Apache BookKeeper: distributed stream storage


Karthik Ramasamy is the co-founder of Streamlio that focuses on building next generation real time processing engines. Before Streamlio, he was the engineering manager and technical lead for real-time analytics at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. Karthik is the author of several publications, patents, and "Network Routing: Algorithms, Protocols and Architectures". He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases.

