Skip to content

ETL Pipelines with Spark

Photo of Dean Wampler
Hosted By
Dean W.
ETL Pipelines with Spark

Details

UPDATED Time: We're starting at 5:45 instead of 5:30, so a training session at Conversant that day has time to finish. As always, we'll network first and start the talk around 6:00.

Imran Rashid from Cloudera is our speaker.

You've seen the basic 2-stage example Spark Programs, and now you're ready to move on to something larger. I'll go over lessons I've learned for writing efficient Spark programs, from design patterns to debugging tips. My experience is mostly writing batch ETL pipelines with Spark -- going from prototype to production -- so that is where I'll focus, but hopefully the lessons will apply to other uses of Spark as well. We'll look into some common pitfalls with Spark, and also see how the Spark UI can help out. I'll provide some surprises I encountered coming from Hadoop MapReduce. Finally I'll take a brief look "under the hood" of Spark.

Photo of Chicago Spark Users group
Chicago Spark Users
See more events
Conversant
101 North Wacker Drive, 21st Floor · Chicago, IL