addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

38th Bay Area Hadoop User Group (HUG) Monthly Meetup

Agenda:

  • 6:00 - 6:30 - Socialize over food and beer(s)
  • 6:30 - 7:00 - Azkaban: What LinkedIn Use to Manage Hadoop Workflows
  • 7:00 - 7:30 - Weave: Running YARN apps as simply as running Java threads
  • 7:30 - 8:00 - Finding a Needle in a Stack of Needles: Adding Search to the Hadoop Ecosystem

 

Session I: Azkaban: What LinkedIn Use to Manage Hadoop Workflows

Everyday, LinkedIn updates massive data-sets that power our various online features. Thousands of Hadoop jobs need to be executed reliably in a specific order and on set schedules to support these updates. For several years, LinkedIn has been using Azkaban to coordinate the execution of these jobs on our production and development clusters.

Azkaban is an open-source workflow management platform that runs all of LinkedIn's Hadoop data products. It is based on a reliable and scalable design, and is also highly flexible to be extended with new features and work with different Hadoop components. Azkaban focuses on ease of use, by providing a modern and beautiful web UI, as well as highly customizable job executors.

In this talk, we'll go through the war stories and lessons learned in supporting these workloads on our Hadoop clusters with over a thousand active users and how Azkaban has been redesigned over time to achieve our goals.

Speaker: Richard Park, Software Engineer, LinkedIn


Session II: Weave: Running YARN apps as simply as running Java threads

Hadoop YARN is the new, powerful and highly-flexible resource management framework that allows utilizing a cluster's resources to run MapReduce jobs, as well as other types of applications. However, flexibility comes with complexity and this can make it challenging to get started with YARN. With Weave, we set out to make YARN more accessible to application developers who are familiar with Java but do not have experience with distributed systems. Weave provides a set of libraries that makes writing distributed applications easy through an abstraction layer built over YARN, and it makes running those application as simple as running threads. With the abstraction provided by Weave, an application can be executed in process threads during development and unit testing, and be deployed to a YARN cluster later without any modification. Weave also has built-in support for real-time application logs and metrics collection, application lifecycle management and network service discovery, which greatly reduce the pain that developers face in developing, debugging, deploying and monitoring applications.

Speaker: Terence Yim, Software Engineer, Continuuity

 

Session III: Finding a Needle in a Stack of Needles: Adding Search to the Hadoop Ecosystem

Apache Hadoop is enabling organizations to collect larger, more varied data - but after it's collected how will it be found? Your users expect to be able to search for information using simple text-based queries -- regardless of data location, size, and complexity. How do they quickly find information that's just been created, or been stored for months or even years?

Cloudera Search team lead Patrick Hunt will present their solution to this problem; what architecture is necessary to search HDFS and HBase? How was Apache Solr, Lucene, Flume
and MapReduce integrated to allow for Near Real Time and Batch indexing of documents? What are the solved problems and what's still to come? Join us for an exciting discussion on this new technology.

Speaker: Patrick Hunt, PMC member on the Apache ZooKeeper project, Cloudera Search Team Lead


Yahoo Campus Map:

Detail map

 

Location on Wikimapia:

http://www.wikimapia.org/#lat=[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo

 

Join or login to comment.

  • Mahesh G.

    Very good meeting. Good, informative presentations.

    July 17, 2013

  • Wenfeng W.

    Great presentations

    July 17, 2013

  • Rishabh S.

    Is te meet up still on?

    July 17, 2013

    • Brian P.

      The last speaker, from Cloudera, is just now wrapping up and answering questions.

      July 17, 2013

  • Brian P.

    Anyone looking for experienced Hadoop Cluster Administrators?

    July 15, 2013

    • Brian P.

      We should talk. Are you here at the URL cafe, tonight?

      July 17, 2013

  • A former member
    A former member

    Business Travel Came up at the last minute.

    July 16, 2013

  • Kunal

    co-founder @unraveldata

    July 15, 2013

  • Chetan K.

    Really curious about the search capability in Hadoop. A friend of mine is doing the same thing and I got a chance to evaluate his app and provide feedback. Want to see how much off I was ;-)

    July 15, 2013

  • Kathy R.

    Lefty, so true...sneaking is much more fun.

    2 · July 15, 2013

  • Yahoo! HUG O.

    For Hadoopers who sent notes on the waiting list, please drop by. We have always had space, do not worry much about the long waiting list.

    1 · July 15, 2013

    • Lefty

      Awww ... telling us it's okay takes away the thrill of sneaking in!

      2 · July 15, 2013

  • Romain

    Working on Hue, the Hadoop UI: http://gethue.com

    1 · July 11, 2013

  • Romain

    I am working on Hue, the Hadoop UI: http://gethue.com

    July 9, 2013

  • Aaron L.

    We are interested in Azkaban. and would like to know more from Richard Park.

    July 9, 2013

  • rajesh v.

    Hope I will get a chance!!

    July 6, 2013

  • Eden

    Brian

    1 · June 24, 2013

  • Laura U.

    We'll buy the Pizza! Accenture Tech Labs wants to sponsor and attend the July 17 Meetup. Please contact me to arrange details.

    1 · June 21, 2013

  • Ely K.

    Hi all - If you are interested in learning more about Apache Accumulo, we are having a meetup right before the Hadoop Summit next week. Details here: http://www.meetup.com/Accumulo-Users-DC/events/118370172/

    1 · June 20, 2013

  • Toshiyuki T.

    Looking forward to it

    June 11, 2013

  • Ana M.

    Im excited to be a part of ur group. Cant wait to start meeting new members. Thanks to all. Ana Mullican

    1 · May 26, 2013

Our Sponsors

  • Yahoo

    Free admission, Space, Pizza and Beer

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy