Skip to content

Shuffling Spark with Kafka, Standalone Spark approach

Photo of Tal Sliwowicz
Hosted By
Tal S. and Ruthy G.
Shuffling Spark with Kafka, Standalone Spark approach

Details

A joint meetup between Israel Spark Meetup and HadoopIsrael Meetup

18:00 - 18:30 - Mingling

18:30 - 19:15 - David Gruzman - “Kafka architecture, place of Kafka Streaming and usage of Kafka as Spark's shuffle engine”

We will get into Kafka architecture, and try to understand together - what is Kafka streaming and when it should be used.

In addition we will share our experience of using Kafka to accelerate our Spark application. I will tell also a few words about our system itself, where this acceleration was used.

19:15 - 19:30 - Break

19:30 - 20:15 - Alon Torres - DevOps Enginner, Totango & Romi Kuntsman - Senior Big Data Engineer, Totango - “Standalone Spark for Stability and Performance”

After initially trying AWS EMR and YARN with lackluster results, we decided to move to a manually fine-tuned Spark Standalone setup over AWS EC2.
We'll share our experience with controlling Spark components separately, using Chef, autoscaling groups, log integration, and more.
Since moving to this architecture, the days of cluster instability are long gone, and our server utilization is great.

Photo of Israel Spark Meetup group
Israel Spark Meetup
See more events
Taboola Offices Rooftop
Totseret ha-Arets St 7, 7th floor, Tel Aviv-Yafo · Tel Aviv-Yafo