Messaging,storage,or both: The real-time story of Pulsar & Apache DistributedLog

Boulder/Denver BigData Meetup
Boulder/Denver BigData Meetup
Public group
Location image of event venue


We are super excited to welcome back Dave Rusek to present on Apache Pulsar. For those who remember Dave presented for us on "Big Data at Twitter Scale (" back in 2016. He's a very engaging and knowledgeable speaker who has come back to talk about Apache Pulsar (

Don't miss it. It will be great.


6:00 – 6:15 - Socialize over food and drinks

6:15 – 6:30 - Welcome, opening remarks and announcements
6:30 – 8:30 – The real-time story of Pulsar - Dave Rusek
8:30 - 9:00 - Networking


Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Apache DistributedLog is a replicated log store originally developed at Twitter. It’s been used in production at Twitter for more than four years, supporting several critical services like pub/sub messaging, log replication for distributed databases, and real-time stream computing, delivering more than 1.5 trillion events (or about 17 PB) per day. Pulsar is a distributed pub/sub messaging platform that provides a flexible messaging model. Pulsar was developed at Yahoo and has been used in Yahoo Cloud Messaging Service to deliver several billions of messages per day.

Both built on Apache BookKeeper, Apache DistributedLog and Pulsar are similar in design and implementation but have different goals. Dave will offer an overview of both systems and share advice on how to better use them.

About the speaker

Dave Rusek is a software engineer at Streamlio where he works on messaging and storage technologies. Previously, he spent close to four years working on streaming systems at Twitter. He is a PMC member for Apache BookKeeper and DistributedLog and currently works on Pulsar at Streamlio.