ETL Pipelines with Spark

Name: ETL Pipelines with Spark
Start: 2015-05-20T17:45:00-05:00
End: 2015-05-20T20:45:00-05:00
Location: Conversant

Hosted by Dean W.

Chicago Spark Users

Details

UPDATED Time: We're starting at 5:45 instead of 5:30, so a training session at Conversant that day has time to finish. As always, we'll network first and start the talk around 6:00.

Imran Rashid from Cloudera is our speaker.

You've seen the basic 2-stage example Spark Programs, and now you're ready to move on to something larger. I'll go over lessons I've learned for writing efficient Spark programs, from design patterns to debugging tips. My experience is mostly writing batch ETL pipelines with Spark -- going from prototype to production -- so that is where I'll focus, but hopefully the lessons will apply to other uses of Spark as well. We'll look into some common pitfalls with Spark, and also see how the Spark UI can help out. I'll provide some surprises I encountered coming from Hadoop MapReduce. Finally I'll take a brief look "under the hood" of Spark.

Chicago Spark Users

ETL Pipelines with Spark

Chicago Spark Users

Details

Related topics

You may also like