add-memberalign-toparrow-leftarrow-rightbellblockcalendarcamerachatchevron-leftchevron-rightchevron-small-downchevron-upcircle-with-crosscomposecrossfacebookflagfolderglobegoogleimagesinstagramkeylocation-pinmedalmoremuplabelShape 3 + Rectangle 1pagepersonpluspollsImported LayersImported LayersImported LayersshieldstartwitterwinbackClosewinbackCompletewinbackDiscountyahoo

35th Bay Area Hadoop User Group (HUG) Monthly Meetup



  • 6:00 - 6:30 - Socialize over food and beer(s), General announcements
  • 6:30 - 7:00 - HIT (Hadoop Integration Testing) for Automated Certification and Deployments
  • 7:00 - 7:30 - A Visual Workbench for Big Data Analytics on Hadoop
  • 7:30 - 8:00 - Large Scale Data Ingest Using Apache Flume


Session I (6:30 - 7:00 PM) - HIT (Hadoop Integration Testing) for Automated Certification and Deployment

HIT, which stands for Hadoop Integration Testing, is a Yahoo! framework for assembling Hadoop components into a full Stack and running integration tests to make sure that the components can inter-operate with each other. HIT aims to:

  • build fully automated, modular, scalable and flexible Hadoop stack deployment and test framework
  • develop integration processes and tools for development, quality engineering, operations, and customers
  • grow participation and evolve into a comprehensive self-service stack deployment and test solution

HIT is designed as an open system to plug in any type of testing. We will also share new developments around HIT and how it can be a Platform for all testing and automation.

Presenters: Mukund Madhugiri, Director of Quality and Release Engineering, Cloud Engineering Group, Yahoo!; Baljit Deot, Technical Yahoo!, Cloud Engineering Group, Yahoo!


Session II (7:00 - 7:30 PM) - A Visual Workbench for Big Data Analytics on Hadoop

Two of the major barriers to effective Hadoop deployments in the enterprise are the complexity and limited applicability of MapReduce. Software developers with Hadoop and MapReduce experience are in short supply, slowing big data initiatives. Faster results to a broad range of analytic scenarios require working at a higher level of abstraction, supported by new programming paradigms and tools. In this talk we present one such approach based on our experience developing a visual workbench for big data analytics on Hadoop. This approach enables data scientists and analysts to build and execute complex big data workflows for Hadoop with minimal training and without MapReduce knowledge. Libraries of pre-built operators for data preparation and analytics reduce the time and effort required to develop big data projects on Hadoop. The framework is extensible allowing the addition of new operators as needed. Due to the efficiency of the underlying dataflow framework, the run times are shortened, allowing faster iterations of discovery and analysis.

Presenter: Jim Falgout, Chief Technologist, Pervasive Big Data & Analytics

Session III (7:30 - 8:00 PM) - Large Scale Data Ingest Using Apache Flume

Apache Flume is a highly scalable, distributed, fault tolerant data collection framework for Apache Hadoop and Apache HBase. Flume is designed to transfer massive volumes of event data in a highly scalable way into HDFS or HBase. Flume is declarative and easy to configure and can easily be deployed to a large number of machines using configuration management systems like Puppet or Cloudera Manager. In this talk, we will cover the basic components of Flume, configuring and deploying flume. We will also briefly talk about the metrics Flume exposes, and the various ways in which these can be collected. Apache

Flume is a Top Level Project (TLP) at the Apache Software Foundation, and has made several releases since entering incubation in June, 2011. Flume graduated to become a TLP in July, 2012. The current release of Flume is Flume 1.3.1.

Presenter: Hari Shreedharan, PMC Member and Committer, Apache Flume, Software Engineer, Cloudera


Yahoo Campus Map:

Detail map


Location on Wikimapia:[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo


Join or login to comment.

  • richard v.

    Well is it Flume you wanted, there one next month - Apache Flume Meetup - Palo Alto 6th

    1 · February 23, 2013

  • Leo

    Flume presentation was the better of the lot. Would suggest to split it into several presentation with use case then tech deep dive

    February 22, 2013

  • Joe B.

    Flume presentation from Cloudera was relevant and very good, it is the right stuff to make this event relevant. On the other hand, the two previous presentations were more marketing stuff, not bad but it s not why I come to such events. I understand everybody as a voice, once all the core dev users will stop coming and more product marketing stuff will be presented, this meeting will evolve to something less interesting to me :)

    Anyway, keep the good job, I wish we work on a better balance in the presentation so it stays Open source oriented; product presentations, if I want some, I can go directly to the company website.

    Flume Presentation was Awesome, it did covered well the functionalities and capabilities, it might have been good to add roadmap content for Flume, what is next?

    February 20, 2013

  • Var

    Are you planning to record this event?

    February 20, 2013

  • Dan W.

    Apache Hadoop just committed our new feature patch for additional MapReduce sort/merge functionality & performance - and we have a very smart and unique (and native) approach to Hadoop that takes advantage of map and reduce for data integration functions like joins, aggregations, CDC, mainframe connectivity, etc -– things our customers tell us can be very tough to do and that we have made easy. If this sounds of interest, I can set up a time for you to talk further with our technical people at the Strata show. Please let me know the date and time you can come by the booth.


    2 · February 15, 2013

    • Kalyan K.

      Certainly like to join a meet up session on this topic.

      February 15, 2013

    • Dan W.

      Hi Kalyan Kadiyala, Please stop by our booth at Strata. So I have technical people available, would you be able to give me a date and time that you can stop by.

      February 18, 2013

  • Dinesh

    Would like to attend this meetup.

    February 15, 2013

  • Russell J.


    February 10, 2013

  • Ash


    February 7, 2013

  • Visakh

    Would be awesome to attend this event

    February 7, 2013

  • Yahoo! HUG O.

    Join Us for the Leading Apache Hadoop Community Conference
    (20% off the registration)

    Yahoo! and Hortonworks are pleased to host the first Hadoop Summit Europe conference, to be held March 20-21, 2013 in Amsterdam, Netherlands. This two-day event will feature dozens of sessions dedicated to enabling the next generation data platform. Industry experts, business leaders, architects, data scientists and Hadoop developers will share use cases, success stories, best practices, tips & tricks, cautionary tales and technology insights.

    As a Host for this event, Yahoo! is pleased to offer you a 20% off the registration. Enter promotional code "yahoospon20" to receive your discount.


    1 · February 4, 2013

  • Bobby S.

    Thank you

    January 18, 2013

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy