To conclude Big Data Week, we are holding a hands-on introductory Hadoop workshop on Saturday, April 27th. You've heard and read about it everywhere, now come learn what it is and how to use it. By the end of the workshop, you will have gained a solid understanding of the Hadoop ecosystem, successfully set up a Hadoop cluster, and ran several different types of queries on the data.
The price per attendee is $150.
To maximize value to the attendees, this workshop is limited to the first 20 people that RSVP.
What to Bring:
- Your laptop
- Printed copy of your ticket
What You Will Learn:
- What Hadoop is and how it works
- How to run a MapReduce script
- Use cases where Hadoop should be used
- How to use Pig, Hive, and Mahout
Introduction to Hadoop
- HDFS & MapReduce
- Hadoop History, Adoption, & Maturity
- Hadoop Distributions
The Hadoop Analytics Ecosystem
- Pig - high-level data-flow language and execution framework for parallel computation.
- Hive - data warehouse infrastructure that provides data summarization and ad hoc querying.
- Mahout - machine learning and data mining library.
Setting Up and Running Hadoop
- Setting up clusters
- Running Hadoop on Amazon EC2
- Hadoop Streaming - R, Python, Shell
Marck Vaisman, Owner & Principal Data Scientist, DataXtract LLC
Marck is a co-founder of Data Community DC, runs the Statistical Programming DC Meetup group, and is the owner of data science consulting company DataXtract. He has an MBA from Vanderbilt and a MS in Mechanical Engineering from Boston University.