Eric Sammer, Engineering Manager at Cloudera, will explain how to architect a ETL system that scales.
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie and so on – it can be a challenge to assemble and operationalize them as a production ETL platform.
This presentation will cover one approach to data ingest, organization, format selection, process orchestration and external system integration, based on collective experience acquired across many production Hadoop deployments.
Eric Sammer is an Engineering Manager at Cloudera, where he is focused on highly available, efficient, distributed, and parallel data collection, analysis and reporting back end systems. He has a background in software development, systems and networking & data management systems.
Schedule: We'll start at 7pm with pizza & beer thanks to Stumbleupon, and Eric will go on at 7:30