BIG DATA/BUSINESS Message Board › Jan 25 ACM Data Mining Event - Analytics at Petabyte Scale: Cloudera and Fac
Los Altos, CA
SF Bay Area web site, for this Data Mining Special Interest Group event
Add to your LinkedIn profile
Location: LinkedIn, 2027 Stierlin Ct., Mountain View, CA 94043
Time: 6:30 - 8:30 pm
TITLE 1: ”Hadoop: Distributed Data Processing”
Hadoop is an open-source distributed platform designed to economically store and process data using clustered commodity hardware. Hadoop is Apache’s implementation of the MapReduce/GFS frameworks popularized by Google. In this talk we will demystify this powerful platform, and describe how it enables you to consolidate many different data storage and processing needs in an economically scalable cloud resource.
Dr. Amr Awadallah is Chief Technical Officer and Founder for Cloudera, Inc. Before Cloudera, he was vice president of product intelligence engineering at Yahoo! Inc., where he worked since June 2000 after Yahoo acquired his first startup (VivaSmart). Dr. Awadallah received his PhD from Stanford University in 2007 and his BS/MS degrees from Cairo University in 1992 and 1995, respectively.
TITLE 2: ”Facebook’s Petabyte Scale Data Warehouse Using Hive and Hadoop”
Hive is an open source, peta-byte scale date warehousing framework built on top of Hadoop that enables scalable analytics on large data sets using SQL and some language extensions. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook – both engineering and non-engineering. This talk will highlight how Hive and Hadoop allow us at Facebook to offer a cheap, scalable and flexible infrastructure to do different kinds of analysis. We will talk about the architecture, applications and capabilities of this infrastructure which handles close to 8000 jobs a day and stores nearly 2.5PB of compressed data.
Ashish Thusoo has been with Facebook for the last couple of years and is managing the Facebook data infrastructure team in his most recent role. He started the Hive project at Facebook along with Joydeep and serves at the project lead for Hive at Apache. He is also part of the Hadoop PMC at Apache and has presented Hive at a number of conferences, forums and panels. Ashish has deep expertise in data processing and parallel processing technologies, infrastructure and applications built on those infrastructures. In the past he has worked at Oracle in areas of Parallel Query Execution as well as XML Databases. At Oracle he built many core data warehousing and query processing features and was recognized as one of the leaders in the Parallel Execution team. These features are regularly used in most Oracle based data warehouses.