ETL Pipelines with Spark


Details
UPDATED Time: We're starting at 5:45 instead of 5:30, so a training session at Conversant that day has time to finish. As always, we'll network first and start the talk around 6:00.
Imran Rashid from Cloudera is our speaker.
You've seen the basic 2-stage example Spark Programs, and now you're ready to move on to something larger. I'll go over lessons I've learned for writing efficient Spark programs, from design patterns to debugging tips. My experience is mostly writing batch ETL pipelines with Spark -- going from prototype to production -- so that is where I'll focus, but hopefully the lessons will apply to other uses of Spark as well. We'll look into some common pitfalls with Spark, and also see how the Spark UI can help out. I'll provide some surprises I encountered coming from Hadoop MapReduce. Finally I'll take a brief look "under the hood" of Spark.

ETL Pipelines with Spark