addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Extending Hadoop for Fun and Profit with Milind Bhandarkar

Extending Hadoop for Fun and Profit with Milind Bhandarkar

5:30-6:30pm Pizza and Networking
6:30-8:00pm Talk and Q&A
8:00-8:30pm Wind down

Apache Hadoop project, and the Hadoop ecosystem has been designed be extremely flexible, and extensible. HDFS, Yarn, and MapReduce combined have more that 1000 configuration parameters that allow users to tune performance of Hadoop applications, and more importantly, extend Hadoop with application-specific functionality, without having to modify any of the core Hadoop code.

In this talk, I will start with simple extensions, such as writing a new InputFormat to efficiently process video files. I will provide with some extensions that boost application performance, such as optimized compression codecs, and pluggable shuffle implementations. With refactoring of MapReduce framework, and emergence of YARN, as a generic resource manager for Hadoop, one can extend Hadoop further by implementing new computation paradigms.

I will discuss one such computation framework, that allows Message Passing applications to run in the Hadoop cluster alongside MapReduce. I will conclude by outlining some of our ongoing work, that extends HDFS, by removing namespace limitations of the current Namenode implementation.

About the Speaker:

Milind Bhandarkar was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system, and has been contributing and working with Hadoop since version 0.1.0.

He started the Yahoo! Grid solutions team focused on training, consulting, and supporting hundreds of new migrants to Hadoop. Parallel programming languages and paradigms has been his area of focus for over 20 years, and his area of specialization for PhD (Computer Science) from University of Illinois at Urbana-Champaign.

He worked at the Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo! and Linkedin.

Currently, he is the Chief Scientist at Pivotal (formerly, Greenplum, a division of EMC).

Join or login to comment.

  • Tamao N.

    Slides and video for Milind's talk are here:

    May 15, 2014

    • Vincent

      Hi Tamao, I've been really looking forward to these videos for a very long time. Thanks for sharing and posting!

      May 16, 2014

  • Gero S.

    Good meetup. Met interesting people & learned about extending MapReduce and HDFS, scalability and performance (shuffle being the slowest part -> disk & network) as well as YARN (Hadoop 2.0) and Hamster (Hadoop And MPI on same cluster). Thanks!

    1 · February 26, 2014

  • Cristina R.

    Hi.. does anyone know of any jsp jobs? Thanks

    February 25, 2014

  • Thermond A.

    Thanks but I shall not be able to make it.

    February 25, 2014

  • Kandarp

    won't be able attend. any plan to record the session, please

    February 24, 2014

  • Joe F

    Wont be able to make it

    February 23, 2014

Our Sponsors

  • eBay

    eBay sponsors venues, and food.

  • Pivotal

    Pivotal has sponsored our venues, food, beverages, snacks, and video.

  • Twitter

    Twitter sponsors food and conducts hands-on sessions.

  • Uber

    Uber has provided venues, snacks and video

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy