Skip to content

6th Spark+AI Prague Group Meetup + WINTER DATA PARTY!

Photo of Petr Podrouzek
Hosted By
Petr P.
6th Spark+AI Prague Group Meetup + WINTER DATA PARTY!

Details

Xmas is coming, so what about some Apache Spark, AI and data winter party? If you join our 6th Spark+AI meetup, you can have it all! This time we prepared 3 interesting talks. And here is the agenda (please see the talks abstract below) of the meetup:

==============================
17:30 - 18:00 - Welcome drinks

18:00 - 18:30 - "How to prepare data for fast analytical queries using Spark" - David Vrba (Socialbakers, Data Scientist)

18:30 - 18:50 - 1st break (snacks)

18:50 - 19:20 - "Spark job optimizations 101" - Peter Vasko (Socialbakers, Data Architect)

19:20 - 19:40 - 2nd break (snacks)

19:40 - 20:10 - "Stream the World with Spark" - Jiri Koutny (DataSentics, Data Solutions Architect)

20:10 - 22:00 - Networking (beers, drinks and food)

The meetup is organized by Socialbakers and DataSentics and will be all in English.

Talk abstracts:
"How to prepare data for fast analytical queries using Spark" - David Vrba (Socialbakers, Data Scientist)
Apache Spark is often used for ad hoc data analysis which is possible due to its interactive console and integration with interactive notebook environments such as Jupyter, Apache Zeppelin or other notebook based solutions. Even though Spark provides parallel data processing the queries might still become quite slow if the data is large and has some inconvenient data format. In this talk, we show one practical example of how to prepare the data in the distributed file system to make it accessible by Spark as fast as possible and thus make the work of a data analyst more comfortable by reducing the query execution time. We will discuss techniques such as data partitioning, bucketing, sorting and others.

"Spark job optimizations 101" - Peter Vasko (Socialbakers, Data Architect)
We would like to show some basic and beginner-friendly optimizations for Spark jobs in general. Topics should target specific problems and describe a solution we would typically use in this scenario. Addressing a rather broad range of issues everyone working with Spark might encounter along the way. From lack of parallelism and shuffle spill through ineffective usage of resources or "many small files" issue. This is aimed more towards the less experienced users without going necessarily deep into the internals of spark, but touching as many areas as possible, so everybody should be able to find something new.

"Stream the World with Spark" - Jiri Koutny (DataSentics, Data Solutions Architect)
Welcome to the real-time world! Batch is dead, delete all the pipelines that run all the night, everything must be real-time now! Is it that simple? What are the meaningful use-cases for Spark Streaming? In the past months, we have built a streaming-based DataLake for 2 big enterprises. Come and learn what the main lessons learned and traps were.

Photo of Spark+AI Prague Meetup group
Spark+AI Prague Meetup
See more events
Emplifi
Pernerova 53, Karlín · Praha-Praha 8