addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

An overview of Hulu's data platform by Prasan Samtani & Tristan Reid of Hulu

Abstract:

Hulu viewers generate a tremendous amount of data: our users watch over 400 million videos and 2 billion advertisements a month. Processing and analyzing that data is critical to our business, whether it is for deciding what content to invest in in the future, or to convince external partners on the superiority of Hulu as an advertising platform. In this presentation we will provide an overview of our entire data platform, from collecting and storing the raw event data, to transforming it into a relational structure and performing analysis. We will describe how and for what purpose we use various technologies in the Hadoop ecosystem such as MapReduce, HBase and Hive. The key focus in the talk will be to describe how data flows through our pipeline, and how we have built a powerful toolchain, both on top of, and around Hadoop, to suit our business needs. We will also compare and contrast our methods with those we have seen adopted by other companies seeking to perform similar tasks


Bio:

Prasan Samtani is a software developer at Hulu working on the data platform team, which focuses on building components on top of Hadoop to enable data ingestion, processing, job scheduling, and preparation for analysis. Previously, he was at Alelo, a company designing virtual humans for language and culture training. His interests are in distributed systems, high level languages, and artificial intelligence, and he is a computer history aficionado.


Tristan Reid is a senior software developer at Hulu leading the metrics and reporting tools (MART) team, which focuses on building a toolchain around our data platform to enable easy reporting, ingestion and monitoring. Prior to Hulu, Tristan was VP of Solution Design at Ares Mgmt, leading a team building research tools for investment professionals. He has taught software development courses for IBM, BEA Systems and others in the US, Europe and Asia. Previously, Tristan built risk management and data analysis tools at Capital Research as a Quant Research Associate. Before his career in finance, he participated in a number of start-ups, both as a resource and as principal.

Join or login to comment.

  • raymond h.

    great presentation. The pipeline is kinda typical. DSL is good but not sure if it is overkilled. So, it is better to see more examples to justify why DSL is needed given we can use hive/pig to do the ETL as well. On the other hand, I see the reporting system is being built on top of both RDBMS and Hive. Again, why go to Hive? Is it b/c there are too much data that RDBMS cannot handle? Have you guys looked into columnar db solution in mysql like infinidb or infobright?

    Apart from that, I would like to get some insights of how the data being used and what kinda of intelligence being derived from the data. I notice Hulu is missing the near realtime pipeline and the team is considered to build it.. what are the use cases for that?

    thanks
    ray

    1 · March 18, 2014

    • Prasan S.

      With regards to usage of the data, there are many different uses, some example usages are listed below:
      - For ourselves, looking at which shows are popular vs which ones aren't - that gives us a better idea of how much money we should be willing to spend on a show and also determines what we are willing to produce ourselves.
      - Looking at device trends also helps us understand where to focus development effort
      - For customers, tracking the success of various ad campaigns we run - as a hypothetical example, lets say customers interact a lot more with our Microsoft ads when watching Nikita or Arrow (which both feature a lot of tech and slick interfaces) than when watching older movies, that's something both we and they would like to know.

      As for the realtime pipeline, we are considering that, and part of that process will be trying to see if there is really a valid use case which would justify the effort in building it.

      Thanks for attending!
      --Prasan

      March 19, 2014

    • raymond h.

      Nice. It is more like holistic view of data. I bet you have clustered the interest group and identify what makes those groups to interact with ad and use this info to generate more revenue right? :)

      May 16, 2014

  • Subash D.

    The slides have been uploaded under the files section. I had to split the slides into two due to size restrictions

    March 21, 2014

  • A former member
    A former member

    It was OK. Didn't get any complete stories like how that pipeline is helping the Hulu marketing people, examples of interesting queries/aggregations. Also, didn't get any really challenging technical tasks which were solved. When people start to ask questions like "Is this timestamp a UTC timestamp", or "what is the Hadoop version" multiple times, it's kind of a red flag.

    March 19, 2014

    • Les G.

      I second that emotion.

      March 19, 2014

    • Subash D.

      Definitely a great presentation. Thanks again Hulu

      2 · March 20, 2014

  • Hung T.

    that was great - really enjoyed the presentation. thanks to the speakers for being generous with your time.

    March 19, 2014

  • Jimmy K.

    Great presentation

    March 19, 2014

  • A former member
    A former member

    Awesome presentation.

    March 19, 2014

  • Daniel G.

    Great presentation

    March 18, 2014

  • Les G.

    Great. Good to see how they actually go about dealing with all that info and make use of it. Entertaining as well. thanks much to the Hulu guys.

    March 18, 2014

  • Kunal

    Can somebody please look into the audio issue of the live stream, we cannot hear anything

    March 18, 2014

  • Chakravarthi(Chaks) A.

    still no sound please.

    March 18, 2014

  • Kunal

    Cannot hear anything on the live stream

    March 18, 2014

  • Krishnakumar B.

    Is there a parking avlb nearby

    March 18, 2014

    • Subash D.

      Yes. In the bldg. Its free parking with validation. Just bring your ticket up.

      1 · March 18, 2014

  • Sig N.

    We're driving all the way from Irvine, looking forward to it!

    2 · March 18, 2014

    • Subash D.

      Great see you at the event

      March 18, 2014

  • Kuljit

    What time it ends?

    March 18, 2014

  • Eldon T.

    Since some of the past meetups have had pizza, I'd like to ask if there will be any this time.

    March 18, 2014

  • Kuljit

    Looking forward to the live webcast link!

    March 17, 2014

  • A former member
    A former member

    Will the actual lecture start at 6:30? Usually there's pizza and a meet&greet beforehand, should I try to arrive before 6:30?

    March 14, 2014

    • Subash D.

      No the talk starts at 7

      March 14, 2014

  • John C.

    Stream live, where ??

    March 14, 2014

  • Chris H.

    This will be streamed as well as recorded and posted afterwards.

    1 · March 14, 2014

    • Les G.

      thanks, glad to hear it. Due to the date change I can't make it but look forward to watching it on remote or on video.

      March 14, 2014

    • Jimmy K.

      This is great. We always talk about technology but a lot of organization rarely use it!

      1 · March 14, 2014

  • Jonathan A.

    Just wondering if there will there be a stream available?

    1 · March 13, 2014

  • Subash D.

    Please note the change of date, the event is now scheduled for 18th March

    March 11, 2014

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy