Bay Area Hadoop User Group (HUG) March Meetup

Detailed agenda and summaries:

  • 6:00 - 6:30 - Socialize over food and beer(s)
  • 6:30 - 7:00 - Data driven local commerce @ Groupon
  • 7:00 - 7:30 - Using Apache Hive with HBase and recent improvements
  • 7:30 - 8:00 - JuteRC compiler

 

Data driven local commerce @ Groupon

Groupon started out three years ago as a "deal of the day" company and is rapidly expanding into being one of the largest e-commerce companies on the planet, connecting the worlds of online and offline commerce. In this talk, we give an overview of how Groupon employs a data-driven approach to power local commerce by using Big Data to deliver the right deal to the right consumer at the right time. We'll give a "view from the trenches" on how we've built and grown our relevance technology leveraging Hadoop and other open-source tools. 

Presenter: Shawn Jeffery and Sean O'Brien, Groupon

 

Using Apache Hive with HBase and recent improvements 

Apache Hive and HBase are very popular projects in the Hadoop ecosystem. Using Hive with HBase was made possible by contributions from Facebook around 2010. In this talk, we will go over the details of how the integration works, and talk about recent improvements. Specifically, we will cover the basic architecture, schema and data type mappings, and recent filter pushdown optimizations. We will also go into detail about the security aspects of Hadoop/HBase related to Hive setups.

Presenter: Enis Soztutar, Hortonworks

 

JuteRC compiler

Yahoo’s data ETL pipeline continuously processes more than tens of terabytes of data every day.  Seeking for a good data storage methodology that can store and fetch this data efficiently has always been a challenge for the Yahoo data ETL pipeline. A study done recently inside Yahoo has shown a dramatic data size reduction by switching from Sequence to RC File Format.  We have decided to take the approach of converting our data to the RC File Format. The most challenging task is to manually serialize the data objects. We rely on Jute, a Hadoop Record Compiler, to provide serialization code. However, Jute does not support RC File Format. In addition, RC file format does not support native Hadoop writable objects. Therefore writing serialization code becomes complicated and repetitive. Hence, we invented the JuteRC compiler which is an extension to the Hadoop Record Compiler (Jute). It generates serialization/deserialization code for any user defined primitive or composite data types. MapReduce programmer can directly plug in the serialization/deserialization code to generate MapReduce output data file that is in RC File Storage Format. With the help of JuteRC compiler, our experiment against Yahoo audience data showed a 26-28% file size reduction and 40% read/write performance improvement compared to Sequence File. We are currently in the process to open source JuteRC.

 

Presenter: Tanping Wang, Yahoo

 

Yahoo Campus Map:

Detail map

 

Location on Wikimapia:

http://www.wikimapia.org/#lat=[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo

 

Join or login to comment.

  • Gitanjali Gulve Sehgal (.

    Sorry I missed it. Will the slides be posted somewhere?

    April 17

  • poonam

    Support.com is seeking talented folks for our Data Engineer role based out of Redwood City,CA (SPRT) is a leading provider of cloud-based services and software (SaaS) that enable technology support for a connected world and are rolling out our next generation of products. Our offices are open ,collaborative and yes,fun!
    Would you be interested in knowing more about the role ?Please reach out to me at [masked] .In an event this is not the right time please archive my request! Please feel free to ask if you have any questions!

    Thank You!
    Poonam Tiwari
    Technical Recruiter
    Support.com[masked]

    April 17

  • A former member
    A former member

    good talks, although so much talking in the background it was hard to concentrate.

    March 23, 2012

  • rakesh d.

    There was too much ambient chatter of people. We need to ask them to maintain silence while other people are interested in listening to the presenter. Also, some of the presenters were just literally reading the slides. We should ask the presenters to add some value to the content.

    March 23, 2012

  • Austin C.

    Excellent meeting. Great presentation on Groupon's Hadoop usage. Great presentation on Hadoop scenarios by HortonWorks. Really like the low-latency Hive-HBase-Hadoop scenario / architecture.

    March 22, 2012

  • A former member
    A former member

    It was good except for THE VERY RUDE PEOPLE TALKING LOUDLY over the speakers in the rear of the room.

    March 22, 2012

  • David L.

    Excellent. The topic is good, presenters are knowledgable.

    March 22, 2012

  • Serge M.

    Too much noise from people speaking on the background. It was hard to hear the presenters

    March 22, 2012

  • A former member
    A former member

    Organization and moderation of this event could be done much better

    March 22, 2012

  • A former member
    A former member

    My first HUG meeting.
    Got a lot of useful information. Some of the detail were above my understanding level at this time. But i think these meetings will get me upto speed sooner and hope to get more involved in future meetings.

    March 22, 2012

  • A former member
    A former member

    There was too much noise from the back and could not hear the speakers well. Otherwise it would have been 5 stars.

    March 22, 2012

  • James D.

    There was a lot of background noise of people speaking in the cafeteria, and the speaker system was very poor. I could not make out what the presenters were saying. The organizers made no effort to deal with the problem. I left early.

    March 22, 2012

  • A former member
    A former member

    It was great meeting new folks from HUG, it was my first HUG ;)
    Let me know if you are looking for a new opportunity, I have a few in the Bay Area. Thanks!

    March 22, 2012

  • A former member
    A former member

    The speakers with interesting topics all of them except Groupon were bad public speakers and were difficult to understand. Groupon's case of study felt more like a sales pitch.

    March 22, 2012

  • Nausher

    Will the slides be made available?

    March 21, 2012

  • Michael C.

    My friend is looking for Hadoop hbase expertise.

    March 21, 2012

  • A former member
    A former member

    First HUG for me !

    March 21, 2012

Our Sponsors

  • Yahoo! Inc.

    Meeting space, pizza and drinks are sponsored by the Yahoo! Hadoop team.

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy