Skip to content

Shuffling Spark with Kafka, Standalone Spark approach

Photo of David Gruzman
Hosted By
David G.
Shuffling Spark with Kafka, Standalone Spark approach

Details

Shuffling Spark with Kafka, Standalone Spark approach
A joint meetup between Israel Spark Meetup and HadoopIsrael Meetup

18:00 - 18:30 - Mingling

18:30 - 19:15 - David Gruzman - “Kafka architecture, place of Kafka Streaming and usage of Kafka as Spark's shuffle engine”

We will get into Kafka architecture, and try to understand together - what is Kafka streaming and when it should be used.
In addition we will share our experience of using Kafka to accelerate our Spark application. I will tell also a few words about our system itself, where this acceleration was used.

19:15 - 19:30 - Break

19:30 - 20:15 - Alon Torres - DevOps Enginner, Totango & Romi Kuntsman - Senior Big Data Engineer, Totango - “Standalone Spark for Stability and Performance”

After initially trying AWS EMR and YARN with lackluster results, we decided to move to a manually fine-tuned Spark Standalone setup over AWS EC2.
We'll share our experience with controlling Spark components separately, using Chef, autoscaling groups, log integration, and more.
Since moving to this architecture, the days of cluster instability are long gone, and our server utilization is great.

Photo of HadoopIsrael group
HadoopIsrael
See more events
Taboola offices
7 Totseret Ha'aretz St., 5th fl., Tel Aviv · Tel Aviv-Yafo