Data Intensive processing in practice: Kafka Streams and Spark


Details
Teaser video: https://youtu.be/KXd_rBoep8Q
Big Data has gone main stream and data driven organizations are all around us. This time we will hear from two experienced Data Engineers who know all about how the ecosystem has grown the past years. They will show us the tools they are working with and the real-world applications that they have encountered.
Talk 1 - Apache Spark, Marcin Szymaniuk
Would you like to see Big Data use cases implemented on Spark? Are you working with Big Data projects already and you are considering introducing Spark to your technology stack? Would you like to know what Spark is good at and what parts of Spark are tricky?
First Marcin will provide an overview of multiple Spark use cases in various areas. The number of use cases described will be broad enough so it is likely that you will be able to find similarities to projects you are working on and see how you can use Spark to solve problems and bring value to the company.
The second part of the presentation will be focused on technical challenges which need to be solved when introducing Spark to your ecosystem. Spark has a nice and relatively intuitive API. It also promises high performance for crunching large datasets. It’s really easy to write an app in Spark. Unfortunately, the nice API might be misleading and make us forget that we are implementing a distributed application. For that reason it’s easy to write one which doesn’t perform the way you would expect or just fails for no obvious reason.
In a nutshell, Marcin will show all the lessons he has learned over 3 years of experience with Spark. It will give you an overview of what to expect and help you to avoid making mistakes typically made by Spark newbies. Marcin will emphasize what you should know about your data in order to write efficient Spark jobs and what the most important configuration tweaks and optimization techniques are which will come in handy when implementing Spark based solutions.
About Marcin
Marcin is a Data developer, Data infrastructure administrator and Consultant at TantusData. He has a lot of handson experience with technical problems related to Big Data (clusters with hundreds of nodes) as well as practical knowledge in business data analysis. Companies Marcin has worked for or consulted for include: Spotify, Apple, Telia and a few small startups.
Talk 2 - Kafka Streams at the Olympics, Casper Koning
Pyeongchang, South Korea - February 2018. The world is watching the Winter Olympics. Hockey, Skiing, Curling, Figure Skating and Bobsleighing: everything is being streamed online. Gracenote Sports set out to create a Kafka Streams pipeline as a lightning fast backend for Olympic Data widgets, powering a major Olympic broadcaster’s apps and website. Combining streams of real-time results, intermediate times, provisional standings, video logs and athletes’ biographies. For a gold medal in viewing experience.
We are going to present an overview of our development using Kafka Streams to enhance our existing rich Olympic model and infrastructure. We will also present some of our experiences and results of running in production during the Olympics.
This talk is a sneak preview for the talk Casper will give at the Kafka Summit in London in April.
About Casper
At Codestar, Casper is our go-to guy when it comes to Kafka. Casper has been developing applications based around Kafka for over two years, and could well be called a veteran on the subject. He has leveraged Kafka to power the search engine and the recommendation engine for one of the largest Dutch webshops and has, more recently, helped Gracenote Sports get ready for the Olympic Games this February.
PROGRAM
8 March 2018
17:30 Doors open and registration
18:00 Food
18:30 Talk 1
19:30 Break
20:00 Talk 2
21:00 Drinks
Because we are always looking for ways to improve our meetups, this meetup will be livestreamed as an experiment on our Youtube channel: https://www.youtube.com/channel/UCqwHhJNEUe7D-HGsX4zvKzQ

Data Intensive processing in practice: Kafka Streams and Spark