Past Meetup

Apache Spark Streaming + Kafka: An integration Story

This Meetup is past

18 people went

Location image of event venue

Details

The BCN Data Engineering meetup is back after the summer holidays, just in time to promote the DataEngConf Barcelona, 25th and 26th of September, which you can read more about here: https://www.dataengconf.com/speakers-bcn18.

As might have seen as well, our own Pete Soderling is giving a talk on how to successfully create communities. Check it out here: https://www.meetup.com/CTOs-co-Barcelona/events/254243706/

That's it for the promotion of other events. Now to the event the 17th, where we're glad to announce Joan Viladrosa Riera (https://www.linkedin.com/in/joanviladrosa/), Head of Data Engineering at Stuart Delivery. At Stuart, Joan is heading the designing and building of tools for a more data-centric company. Previously, he worked at Billy Mobile as a Senior Big Data Architect and Tech Lead, where he led the transition to the Hadoop and Spark ecosystem, designing applications that scale up to thousands of events per second.

Joan will tell us an integration story between Apache Spark and Kafka. As Joan puts it: Spark Streaming has supported Kafka since it’s inception, but a lot has changed since those times, both in Spark and Kafka sides, to make this integration more fault-tolerant and reliable. Apache Kafka 0.10 (actually since 0.9) introduced the new Consumer API, built on top of a new group coordination protocol provided by Kafka itself. So a new Spark Streaming integration comes to the playground, with a similar design to the 0.8 Direct DStream approach. However, there are notable differences in usage, and many exciting new features. In this talk, we will cover what are the main differences between this new integration and the previous one (for Kafka 0.8), and why Direct DStreams have replaced Receivers for good. We will also see how to achieve different semantics (at least one, at most one, exactly once) with code examples.

Agenda:
19:00 Doors open & beers/soda

19:10 Brief intro by BCN Data Engineering

19:15 Apache Spark Streaming + Kafka: An integration Story by Joan

19:45 Q/A

20:00 More beers/soda and network

20:30 Wrap

Join us for another #bcndataeng