Apache Spark: Spark Streaming, Dataframes, Zeppelin and more
Details
This is a technical deep-dive session for people involved in the modern server side development technologies. This event is hosted by Taboola at their offices in Tel Aviv.
This time Inneractive (http://inner-active.com)is going to present Spark streaming solutions that we've recently implemented as a part of our BigData workflow.
Taboola is going to provide parking. Please bring your parking ticket to the meetup and Taboola guys will stamp it.
17:30 - 18:00 - Mingling
18:00 - 18:45 - Richard Grossman, System Architect @Inneractive (http://inner-active.com)- “How Inneractive succeed to process more than 1 billion events / day”
Richard will tell us how Inneractive is using Spark Streaming, Kafka, Parquet DB and other cutting edge technologies to handle their big data challenge.
18:45 - 19:00 - Beer & Coffee break
19:00 - 19:15 - Ruthy Goldberg, Tal Sliwowicz (Taboola R&D) - "Spark Summit highlights"
The recent spark summit last month was very interesting. We will take a few minutes to go over the highlights and point to some interesting talks that are worth watching.
19:15 - 19:50 - [Same Presenters, Taboola R&D] - "Using Spark and Cassandra together for data analysis using Data Frames and Zeppelin"
In the previous meetup we told the story of Newsroom, a product that is used for real time analytics for home page editors. We are using Cassandra to collect all the data for Newsroom. Unfortunately, data in Cassandra is very hard to use for human analysts. Therefore, we created a new framework(*) that very quickly and efficiently loads any data from Cassandra into Spark Data Frames. Our Analysts were given access to it through Apache Zeppelin, and in this talk we will share what we did and our experience with Data Frames and Zeppelin.
*We are planning to open source this framework
