addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-crosscrosseditemptyheartfacebookfolderfullheartglobegmailgoogleimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartrashtriangle-downtriangle-uptwitteruseryahoo

Intro to Big Data with Hadoop Workshop

To conclude Big Data Week, we are holding a hands-on introductory Hadoop workshop on Saturday, April 27th.  You've heard and read about it everywhere, now come learn what it is and how to use it.  By the end of the workshop, you will have gained a solid understanding of the Hadoop ecosystem, successfully set up a Hadoop cluster, and ran several different types of queries on the data.

The price per attendee is $150.  

To maximize value to the attendees, this workshop is limited to the first 20 people that RSVP.

What to Bring:

  • Your laptop
  • Printed copy of your ticket


What You Will Learn:

  • What Hadoop is and how it works
  • How to run a MapReduce script
  • Use cases where Hadoop should be used
  • How to use Pig, Hive, and Mahout


Introduction to Hadoop

  • HDFS & MapReduce
  • Hadoop History, Adoption, & Maturity
  • Hadoop Distributions

The Hadoop Analytics Ecosystem

  • Pig - high-level data-flow language and execution framework for parallel computation.
  • Hive - data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout - machine learning and data mining library.

Setting Up and Running Hadoop

  • Setting up clusters
  • Running Hadoop on Amazon EC2
  • Hadoop Streaming - R, Python, Shell


Marck Vaisman, Owner & Principal Data Scientist, DataXtract LLC

Marck is a co-founder of Data Community DC, runs the Statistical Programming DC Meetup group, and is the owner of data science consulting company DataXtract.  He has an MBA from Vanderbilt and a MS in Mechanical Engineering from Boston University.  


Join or login to comment.

  • Naser C.

    Thank you Marck! It was an excellent meetup thank you for sharing lots of actual field knowledge to the team. Big data and Hadoop are large topics with lots of bolts and pieces, but you tried to balanced it with real life example and explained it with great passion. Great job!

    May 16, 2013

  • Harry F.

    If you would like to check out Logi Analytics's platform we are holding a seminar on May 15th.
    Join us on May 15th for a seminar on embedded analytics best practices and go-to-market strategies for software and SaaS providers. Learn about customer use cases, how to price, package, and promote your analytics offering, and participate in hands-on technical training (you'll walk away with what you create!).

    Key Takeaways:
    • How to create great products with embedded analytics
    • The 5 stages in the embedded analytics maturity model
    • Practical considerations - user requirements, UX, project management and resourcing
    • Best practices for pricing, packaging, and promotion
    Following the presentations and training, network with your peers at the Tech Cocktail Mixer & Startup Showcase (same location, and food & drinks are on us!).

    View agenda and Register at

    Event Location:
    1776 Campus [masked]th St NW
    Washington, DC 20005

    May 8, 2013

  • Marck V.

    I enjoyed teaching the workshop today and I appreciate the feedback. I'm sorry we did not get to the hands-on examples, it was definitely ambitious to cover all of the material in three hours. I spoke with Tony and I'd like to schedule a 1 to 1.5 hour follow up over the next week or two as a more informal meetup, perhaps an evening, where we can work through real examples using Hive and Pig. I'll send out a separate email to coordinate.

    Your feedback is also very useful for us to be able to plan future events. I think this run will help me re-package the content and work out the kinks for a smoother flow.

    I will also send follow-up materials over the course of next week.

    Thanks again!

    1 · April 27, 2013

    • A former member
      A former member

      Thank you Mark. Looking forward to it!

      April 27, 2013

    • Ryan N.

      Sounds great, Mark.

      April 29, 2013

  • A former member
    A former member

    Hi Marck and Tony: Please do add me to your next email update. I was a walk-in and may not be on your original list! Thank you!

    April 29, 2013

  • Ahmed

    Marck:Thanks for the Hadoop introduction today. However, I was expecting more. For example, some hands-on tasks and an introduction to other topics (Mahout, pig, etc.). I agree with Mohammed and Fletcher that a part-2 session would be great.

    April 27, 2013

  • A former member
    A former member

    Mark: Thanks for the session today. I really hope you consider having another session since we didn't complete the stated objectives of the course.

    April 27, 2013

    • A former member
      A former member

      Specifically, we didn't get to successfully set up a Hadoop cluster, and run several different types of queries on the data.

      1 · April 27, 2013

  • Fletcher

    Marck, the thorough background on Hadoop's infrastructure was really helpful in terms of figuring out use cases! However, given how much we all paid for hands-on training, it would be great if possible to set up a part 2 so we can at least see a working Hadoop job and touch on the rest of the agenda - Pig, Mahout, etc...

    1 · April 27, 2013

  • freddie s.

    Assuming no special laptop requirements (RAM/disk/OS/etc.) to use Amazon EMR? I'm running Win 7 (64b) w/ 4GB RAM & 86.3 GB free disk space.

    April 23, 2013

    • Tony O.

      That is correct. No special requirements necessary.

      April 23, 2013

  • Ahmed

    Hi. I am glad that interest in Hadoop and related tools are increasing, and I truly understand that you have limited places. Would you consider adding another hour to the time and include more people (may be 8am-12pm or 9am-1pm). Or if the waiting list increases to 20 or so, would you consider having another workshop for the waiting list (Group B) on Sunday or the following week? Thanks!

    1 · April 15, 2013

    • Tony O.

      If we get enough folks interested, we will definitely consider putting on additional workshops on this topic (and other related topics) in the future.

      3 · April 15, 2013

  • Roger D.

    For 3 hours to cover so much topics, I am curious how long it takes to set up Hadoop environment in our laptop?

    1 · April 12, 2013

    • Tony O.

      We will be leveraging Amazon Elastic MapReduce, so you don't need to set up Hadoop on your machine. We want to get you up and running as quickly as possible so that you can focus on working with it instead of administration. If you want to know how to install Hadoop on a single node cluster, here is a good tutorial - http://www.michael-no...­

      April 12, 2013

  • freddie s.

    I'm concerned about quality given such a broad scope of technical topics in just 3 hours. What if any hard-/soft-copy takeways are included in the price?

    April 4, 2013

    • Tony O.

      We will provide you with the slides from the workshop, script files, and links to the data sets so that you can review and reference what was covered. The goal is to give you a well-rounded introduction to the concepts and provide you with a solid foundation from which you can build as you learn more.

      April 5, 2013

24 went

Your organizer's refund policy for Intro to Big Data with Hadoop Workshop

Refunds offered if:

  • the Meetup is cancelled
  • the Meetup is rescheduled
  • you can cancel at least 7 day(s) before the Meetup

Payments you make go to the organizer, not to Meetup. You must make refund requests to the organizer.

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy