Out of the battle of big data frameworks Apache Spark is coming out as the main unified open-source platform for scalable data processing/ETL and machine learning both in batch and real-time and is helping bridge the gap between agile data science and production-level data engineering.
We would like to bring together and expand the Spark community in Prague. We plan to organize this as a roughly 2-monthly meetup with a mix of the following topics: - Introduction to Spark and zoom-in on it’s individual aspects such as data processing, machine learning, streaming, graph analytics, etc. - Presentation of real-life use cases and experience by Spark users (companies such as Socialbakers, Ceska sporitelna, Moneta or Barclays) - New developments in the Apache Spark project freshly from the source at Databricks
Come join us if you would like to find out how Spark works and what it’s useful for, hear members of the community present their interesting data science and data engineering use cases and network with like minded people.
The second event of the Spark+AI Prague Meetup Group will present one technical deep dive talk and one Spark + AI use cases talk. The talks are suitable for users who have already used Apache Spark before. However do join us even if you do not have a working knowledge of Spark as you can network with those who have been using it for a while - they are always happy to share their knowledge!
The agenda is as follows:
18:00 - 18:30 - Welcome drinks
18:30 - 19:15 - Technical deep dive: "Spark SQL under the hood" - David Vrba, Data Scientist, Socialbakers
19:15 - 19:30 - Break (snacks)
19:30 - 20:15 - Spark + AI use cases: "(Practical) Intro to Machine Learning on Spark" - Milan Berka, Spark Architect, DataSentics
20:15 - 21:30 - Community building
The meetup is organized by Socialbakers and DataSentics and sponsored by Microsoft. The talks will be delivered in English (we will move to Czech if the whole audience can understand Czech).
More about the talks:
Technical deep dive: "Spark SQL under the hood" - David Vrba, Data Scientist, Socialbakers
Spark SQL is a module of Spark that provides Structured APIs - DataFrames, Datasets and SQL tables. These expressive and high-level APIs allow for more optimized execution as compared to using directly the low-level RDD primitives. These performance benefits are achieved since Spark uses built-in optimization engines such as Catalyst and Tungsten. In this talk, we will take a look under the cover of DataFrame API and see how these optimizations work and what advantages they offer.
Spark + AI use cases: "(Practical) Intro to Machine Learning on Spark" - Milan Berka, Spark Architect, DataSentics
In this talk, we will discuss how to perform basic machine learning tasks, such as feature engineering, model training and model evaluation, in Spark. We want to get very practical! Therefore expect a lot of examples demonstrating how to apply Spark ML methods on the real world problems.
One should leave with the feeling that doing a (big) data science with Spark is just easy... and it does not matter whether you belong to a team "linear regression in R", "decision trees in Python" or "neural nets in SQL".