Shuffling Spark with Kafka, Standalone Spark approach


Details
Shuffling Spark with Kafka, Standalone Spark approach
A joint meetup between Israel Spark Meetup and HadoopIsrael Meetup
18:00 - 18:30 - Mingling
18:30 - 19:15 - David Gruzman - “Kafka architecture, place of Kafka Streaming and usage of Kafka as Spark's shuffle engine”
We will get into Kafka architecture, and try to understand together - what is Kafka streaming and when it should be used.
In addition we will share our experience of using Kafka to accelerate our Spark application. I will tell also a few words about our system itself, where this acceleration was used.
19:15 - 19:30 - Break
19:30 - 20:15 - Alon Torres - DevOps Enginner, Totango & Romi Kuntsman - Senior Big Data Engineer, Totango - “Standalone Spark for Stability and Performance”
After initially trying AWS EMR and YARN with lackluster results, we decided to move to a manually fine-tuned Spark Standalone setup over AWS EC2.
We'll share our experience with controlling Spark components separately, using Chef, autoscaling groups, log integration, and more.
Since moving to this architecture, the days of cluster instability are long gone, and our server utilization is great.

Shuffling Spark with Kafka, Standalone Spark approach