addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Large Scale ETL with Hadoop

Eric Sammer, Engineering Manager at Cloudera, will explain how to architect a ETL system that scales.

Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie and so on – it can be a challenge to assemble and operationalize them as a production ETL platform.

This presentation will cover one approach to data ingest, organization, format selection, process orchestration and external system integration, based on collective experience acquired across many production Hadoop deployments.


Eric Sammer is an Engineering Manager at Cloudera, where he is focused on highly available, efficient, distributed, and parallel data collection, analysis and reporting back end systems. He has a background in software development, systems and networking & data management systems.


Schedule: We'll start at 7pm with pizza & beer thanks to Stumbleupon, and Eric will go on at 7:30

Join or login to comment.

  • Robert B.

    An excellent talk, about the right length, the right level and well illustrated with examples.

    April 26, 2013

  • Inna G.

    excellent presentation ! looking forward to hear more from the top engineering teams !

    April 26, 2013

  • Kathy L.

    This was a great presentation - I especially love when people talk about best practices. Learning about many of the tools/services in the space was also helpful. Looking forward to the next meeting!

    April 26, 2013

  • John H.

    Thank you for presenting, Eric! A very informative/engaging presentation. And thank you Pete for arranging. A very good first meeting!

    April 26, 2013

  • Gerald W.

    Wish I could make it -- relevant to our product at

    April 22, 2013

    • Pete S.

      Hey Gerald - sorry you can't make it! However, we'll be video recording the event and will post a link later.

      2 · April 22, 2013

    • Gerald W.

      Thanks Pete -- most appreciated.

      April 22, 2013

39 went

Our Sponsors

  • Hakka Labs

    Growing the largest community of data engineers and data scientists

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy