March 31, 2011 · 6:45 PM
This location is shown only to members
PIG is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs. PIG is probably the easiest way to get started with processing data using Hadoop.
If there is enough interest in this topic I will put together a short intro and a demo of how Contextweb is planning to leverage PIG to process hundreds of gigabytes of data per day.
Alex Rovner - Leading the data team at contextweb. Responsible for managing hundreds of ETL jobs and terabytes of data.
Thejas Nair - Software engineer in the Pig team at Yahoo and a committer in the Apache Pig project. He has been working on solutions for large scale distributed data processing at Yahoo for the last 8 years.