addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Session II: MapReduce Dive Down and Join Optimization

Hi all Data Bingers,

In the first session we mostly went through MapReduce in theory and barely got our first MapReduce example working. In the second session I will focus completely on hands-on and cover following topics:

1. Custom sorting (and secondary sorting)

2. Combiner

3. Map Side Join

4. Reduce Side Join

 

Join or login to comment.

  • Visakh

    Ritesh I was wondering whether you could pls post the Instruction for Session 1, the one which we did with a local Install of Hadoop using HomeBrew for Mac?

    August 6, 2013

  • A former member
    A former member

    Here is my version of Movie Ratings exercise. Merged aggregation of reviews job and reducer-side join job into one.

    The mapper:
    - prints to stdout the movie_id, record type and count
    - prints to stdout the movie_id, record type and movie_name

    The reducer gets partitioned data, sorted by movie_id and record type with movie name entry as first. Everything following that is a review count entry which is aggregated in the same reducer job.

    Find the code at:
    https://gist.github.com/vaidik/078f8ac5e751427c69b1

    1 · August 4, 2013

    • Ritesh A.

      Hi Viadik, that's a great solution. Keep it up. You might want to do something more interesting the movie dataset.

      August 4, 2013

  • A former member
    A former member

    Want to add my thanks, as well...I appeciate the opportunity to learn and collaborate.

    August 4, 2013

  • Vijay B.

    Nice practical way of learning Hadoop, Big Data. Thanks to Organizers.

    August 4, 2013

  • Nirav

    Great segway into more complicated Map/Reduce jobs. Need to continue increasing the level of difficulty of exercise. Awesome job by Ritesh & great facility by NetApp/Prem.

    August 4, 2013

  • Nirav

    Anyone driving from Marin, NorthBay, looking for travel partner. I am driving from San Rafael, through 580, 880...

    August 3, 2013

  • Ritesh A.

    Hi all,

    I am still trying to secure some venue for our second meetup. But for now I created the exercise that we will go over in the meetup. If you are feeling excited and enthusiastic about MapReduce, I suggest start working on the second exercise. You can find it over here:
    https://drive.google.com/folderview?id=0B7BkZNhsiufqSWgxR3B4R3JFSVk&usp=sharing

    2 · July 31, 2013

  • Sastry K.

    Are we still looking for a location? If you are open to SF, I can get a location.

    July 23, 2013

    • Ritesh A.

      Hi Sastry, Do you think you might be able to get the conference room for the meetup in SF? I am not able to get any location in the Palo Alto area.

      July 29, 2013

    • Sastry K.

      Sure. I will talk to my VP and let you know by tomorrow. How many people?

      July 29, 2013

  • Ritesh A.

    Hi all,

    In the last session, I realized that people using different operating systems really made it difficult to coordinate the exercise. Hence I have decided to use HortonWorks Standalone Hadoop setup. (Thanks to Luke Lam for helping me with the setup). In order for the second session, I recommend that you make sure that you have HortonWorks' standalone installed and you able to completed the following exercise: https://docs.google.com/document/d/1ewFceJ1-LTjTrPgU2iZMkM4j_yvDWjL35tv6bGvGC6g/edit?usp=sharing

    If you are having problem with the exercise, I recommend posting messages to the google group over here: https://groups.google.com/forum/#!forum/handsonbigdata

    Enjoy data binging :)

    Ritesh

    1 · July 25, 2013

  • Vijay B.

    I could not make it to first session. I have question about hadoop configuration. Are we using a VM provided by any specific vendor or we did it on plain Ubuntu machine? Is there any link to instructions in the first session.

    July 20, 2013

    • Ritesh A.

      Hi Vijay. Previously I didn't recommend any particular VM but for the next meetup I will suggest one. I will post detailed instructions on how to setup the VM and atleast be able to run one example from my last meetup.

      July 24, 2013

  • Sreenivasulu T.

    looking presumptuous to this meetup.

    July 21, 2013

  • Arul R.

    Looking forward for August 4!

    July 20, 2013

  • jayleen

    Looking forward to this.

    July 18, 2013

  • Visakh

    Looking forward to this meetup, Rithesh rocked.

    July 17, 2013

  • Rajesh K

    .

    July 17, 2013

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy