Skip to content

Apache Spark: Spark Streaming, Dataframes, Zeppelin and more

Photo of Tal Sliwowicz
Hosted By
Tal S. and Ruthy G.
Apache Spark: Spark Streaming, Dataframes, Zeppelin and more

Details

17:30 - 18:00 - Mingling

18:00 - 18:45 - Richard Grossman (System Architect @ Inneractive) - “How Inneractive succeed to process more than 1 billion events / day”

Richard will tell us how they are using Spark Streaming, Kafka, Parquet DB and other cutting edge technologies to handle their big data challenge.

18:45 - 19:00 - Beer & Coffee break

19:00 - 19:15 - Ruthy Goldberg, Tal Sliwowicz (Taboola R&D) - "Spark Summit highlights"

The recent spark summit last month was very interesting. We will take a few minutes to go over the highlights and point to some interesting talks that are worth watching.

19:15 - 19:50 - [Same Presenters, Taboola R&D] - "Using Spark and Cassandra together for data analysis using Data Frames and Zeppelin"

In the previous meetup we told the story of Newsroom, a product that is used for real time analytics for home page editors. We are using Cassandra to collect all the data for Newsroom. Unfortunately, data in Cassandra is very hard to use for human analysts. Therefore, we created a new framework(*) that very quickly and efficiently loads any data from Cassandra into Spark Data Frames. Our Analysts were given access to it through Apache Zeppelin, and in this talk we will share what we did and our experience with Data Frames and Zeppelin.

*We are planning to open source this framework

Photo of Israel Spark Meetup group
Israel Spark Meetup
See more events
Totseret ha-Arets St 7, 5th floor · Tel Aviv-Yafo