Meetup #9


Details
Hi,
Our next meetup will happen on May 3rd. We will have a presenter from Exacaster and a special guest from Denmark this time.
The talks are as following:
Personalized workflows with Kafka - Egidijus Pilypas, Exacaster
Exacaster is building extremely personalised experiences for 40 000 000 telco and and retail consumers on a daily basis. We will be sharing our experience how Kafka and micro services architecture enabled us to build personalised workflow for each consumer.
Extreme Apache Spark: How in 3 months you can create a pipeline for processing 2.5Bn rows/day - Josef Habdank, Infare Solutions
"Apache Spark is simply awesome" says our speaker Josef Habdank. In this talk he will give you a crash course how to design an extremely scalable data processing pipeline on Apache Spark on using tech such as: Spark Streaming, Scala, Kafka/Kinesis, Snappy, Avro, Parquet, HDFS/S3, Zeppelin.
It will be a story of 3 crazy developers who in 3 months managed to develop and put to production a Spark data pipeline which can crunch through 2.5 billion airfares a day without breaking a sweat. It was an amazing journey in which they had to do everything themselves: take care of hardware and deploy platform, research technologies, hack out all the code in Spark/Scala, test scalability, do the monitoring tools and deliver the complete business intelligence product to the customer.
Josef says: "Yes it is possible, and it is possible in 3 months. If you come to the talk I will share with you DOs and DONTs of such a process, I will explain which technologies turned out to be right and what was a mistake." You will learn how to use correct message compression and serialization (Avro + Snappy), best practices for in-stream error handling, how build a successful 50TB+ Parquet based datawarehouse and more, with the code samples provided.
About the Speaker: Josef Habdank is a Lead Data Scientist and Data Platform Architect at Infare Solutions with previous experience from Big Data and Data Science practitioners such as Thomson Reuters, Adform, as well as Department of Defense. He is an expert in Apache Spark and Spark enabled technologies. He is a frequent speaker on prominent BigData conferences such as Spark Summit or High Load Strategy. Additionally he is a specialist in real time modelling and non linear forecasting, and has experience with with systems processing tens of billions of data points daily and data warehouses holding hundreds of billions of rows.
The meetup will take place @ Vinted office, entrance is near Dviratis Plius bike shop.

Meetup #9