August 2015: Distributed Stream Processing

This is a past event

154 people went

Location visible to members


This months meetup is focusing on distributed stream processing. We're kindly being hosted and sponsored by Mind Candy at their office by Old Street.

We've got three talks for the evening:

Samza and the Unix philosophy of distributed systems
Martin Kleppmann - Author of Designing Data-Intensive Applications ( (

One of the big ideas in Unix was to allow small, simple command-line tools to be chained together with pipes. Each of those tools would do one thing and do it well. Even now, 50 years later, Unix tools are one of the most powerful ways of getting things done: a one-liner of grep | awk | sort | uniq is still one of the fastest ways of processing data and analysing logs.

Many modern data systems are monolithic, the very opposite of the Unix philosophy. But Apache Samza is different: it is, in some sense, an attempt to bring the Unix philosophy into 21st-century distributed systems. In this talk, we will explore the design decisions behind Samza, and see how the Unix philosophy can help us build modern systems that are robust, scalable and maintainable.

Stream Processing with Samza at Improve Digital
Garry Turkington - CTO Improve Digital (

Samza allows us to write stream processing jobs that consume and effectively join together data from multiple streams. In addition it can use its feature of bootstrap streams to support the concept of reference data and use persistent state to support the usage of both features. This talk will walk through a job used at Improve Digital to highlight the usage of these features.

Change Data Capture and Logs at
Dan Harvey - Data Architect at (

In any modern web platform you end up with a need to store different views of your data in many different datastores. I will cover how we have coped with doing this reliability at across a range of different languages, tools and datastores.