addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Hadoop hands on group learning

A suggested meetup for people interested in learning hadoop to join other hadoop learners for some hand-on work on our laptops (running a pseudo cluster in a virtual machine). This will not be as structured as when Adam takes care of it. No presentation. May be no supervision. Just people who want to learn and share their bits of knowledge with others. I would like to focus on learning streaming in hadoop (with python), using the enron email dataset (http://www.cs.cmu.edu/~enron/), and trying to answer some basic questions (At what times of the day are emails typically sent ? What is the average number of emails sent per day by managers ? And the standard deviation ? ...), but you can focus on your own (hadoop) topic of interest as you wish.

Join or login to comment.

  • Arthur

    basic tutorial (wordcount) to use hadoop with python (streaming): http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

    November 22, 2012

  • Arthur

    For quick start if you haven't played with hadoop yet: https://ccp.cloudera.com/display/DOC/Hadoop+Tutorial

    November 22, 2012

  • Adam M.

    Running late, be there shortly

    November 22, 2012

  • A former member
    A former member

    Newbie question: Could you recommend a virtual machine setup tutorial? I'd like to prepare my machine before the meetup and I'm guessing I can just ssh into an EC2 instance and install a cloudera package or something? Is there a particular AMI I should be using? Is it better to set up a virtual machine on something other than an EC2 instance? My laptop is a 32-bit lenovo running Windows 7.

    November 16, 2012

    • Adam M.

      Hello Ed, the current standard configuration for Cloudera is to install on regular commodity hardware. Therefore the cloud instructions are sort of best effort until a more integrated cloud-friendly approach is built into Cloudera Manager. I haven't run through that particular set of instructions myself but I imagine there are some gotchas. We can try and give you a hand with this tonight. Did you give the Hortonworks amazon image a try?

      November 22, 2012

    • Arthur

      meeting room 1 is on the first floor if I remember right. I will put signs from the main entrance or coffee entrance to get there.

      November 22, 2012

  • Adam M.

    Arthur and anyone else: Did you want me to pull up some specific material to focus on for Thursday? If you have your CDH4.1.1 images from pigfest that would work. If not, try the AWS/Cloud options I listed below for Cloudera or Hortonworks. If you are going to setup the cloud option then I recommend doing it before hand or you will spend your entire time getting a virtual cluster ready.

    November 20, 2012

    • Arthur

      Thank you Adam. That should be fine. I will have the material you gave us at the pigFest meetup (CDH images and datasets) on a USB key in case people need it. I will also have the dataset I suggested (enron emails) just in case too.

      November 22, 2012

  • A former member
    A former member

    Hi Adam,
    Can I still attend on such short notice?

    kindest regards,
    Haytham,
    [masked]

    November 21, 2012

    • Adam M.

      Yes! There should be room.

      November 22, 2012

  • Arthur

    The venue is confirmed, see the address above. The room can accommodate 20 to 35 people depending on the setup. It will be a bit packed if everybody is coming. It should be comfortable if, as expected, 70% of the people or less are actually coming. For that reason, I closed the registration. Sorry if you were about to RSVP.

    November 20, 2012

  • Adam M.

    Arthur, which CSI? Spadina or Annex? We need to lock it down today.

    November 20, 2012

    • Arthur

      I actually just got the room confirmed. It will be at CSI annex. I will update the meeting details now.

      November 20, 2012

  • Adam M.

    Note, I set the limit to 45 on the RSVP list because I think that's the space. If anyone has a hard number then let me know.

    Thanks

    November 17, 2012

    • Arthur

      Thanks again. Limiting it to 45 sounds about right for the kind of room size we should be able to get.

      November 17, 2012

  • Adam M.

    Arthur, you are now a co-organizer on this particular meetup - root Let me know if you need some material and I can suggest it.

    November 16, 2012

    • Arthur

      Thanks a lot.

      November 17, 2012

  • A former member
    A former member

    Hi Adam I this geared towards beginners?

    November 16, 2012

    • Arthur

      Hi Ram, just answering as I suggested the meeting. It is for anybody wanting to learn more about hadoop and willing to share with other what he already knows. It should be good for beginners (like me). Anybody with more experience is welcome too, can work on more advanced topics, and hope to get help from people with similar level of experience.

      November 16, 2012

  • A former member
    A former member

    Hi Adam I this geared towards beginners?

    November 16, 2012

  • A former member
    A former member

    Hi Adam I this geared towards beginners?

    November 16, 2012

22 went

Our Sponsors

  • IBM

    Meeting facilities, expert speakers, free product, books and education.

  • Big Data University

    Free on-line courses in Hadoop and big data related technologies.

  • Cloudera

    10% off training for Toronto Hadoop User Group members.

  • Hortonworks

    Food, speakers, beverages

  • T4G

    Hosting Meeting locations and providing relevant speakers

  • Paytm Labs

    Paytm Labs offers a venue for the THUG.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy