Past Meetup

45th Bay Area Hadoop User Group (HUG) Monthly Meetup

This Meetup is past

350 people went

Location image of event venue


Detailed agenda and summaries to follow. General agenda:

6:00 - 6:30 - Socialize over food and beer(s) 6:30 - 7:00 - Interactive Analytics in Human Time over Hadoop 7:00 - 7:30 - Hive on Apache Tez: Benchmarked at Yahoo! Scale 7:30 - 8:00 - Continuuity Loom: Modern Cluster Management

Session I (6:30 - 7:00 PM) - Interactive Analytics in Human Time - Lightning Fast Analytics using a Combination of Hadoop and In-memory Computation Engines at Yahoo!

Providing interactive analytics over all of Yahoo!’s advertising data across the numerable dimensions and metrics that span advertising has been a huge challenge. From getting results in a concurrent system back in under a second, to computing non-additive cardinality estimations to audience segmentation analytics, the problem space is computationally expensive and has resulted in large systems in the past. We have attempted to solve this problem in many different ways in the past, with systems built using traditional RDBMS to no-sql stores to commercial licensed distributed stores. With our current implementation, we look into how we have evolved a data tech stack which includes Hadoop and in-memory technologies. We will detail out and contrast the strengths of each of these systems and how they complement each other for some of the use cases we see in advertising. We will describe how we have customized these technologies to work in interesting patterns which have helped get to data and analytics quickly than ever before @ Yahoo! The talk will provide a couple of usecase deep dives like how we compute recursive unique counts on the fly and how we compute segment overlap analytics dynamically.
Speaker: Supreeth Rao, Technical Yahoo, Yahoo!
Supreeth works in the ads and data team in Yahoo! Areas of current focus is to build large scale systems for distributed analytics.
Speaker: Sunil Gupta, Technical Yahoo, Yahoo!
Sunil works in the ads and data team in Yahoo! Areas of current focus is to build large scale systems for distributed analytics.
Session II (7:00 - 7:30 PM) – Hive on Apache Tez: Benchmarked at Yahoo! Scale
The past year has seen the advent of several “low latency” solutions for querying big data. The basic premise of Shark, Presto and Impala has been: Hive on MR is too slow for use in interactive queries. At Yahoo, we’d like our low-latency use-cases to be handled within the same framework as our larger queries, if viable. We’ve spent several months benchmarking various versions of Hive (including 13 on Tez), file-formats, compression and query techniques, at Yahoo scale. Here, we present our tests, results and conclusions, alongside suggestions for real-world performance tuning.
Speaker: Mithun Radhakrishnan, Programmer — Yahoo!
Mithun Radhakrishnan is a committer on the HCatalog project, and a Hive developer at Yahoo. He’s the author of DistCp on Hadoop 0.23+. He’s an erstwhile firmware developer and is prone to flare-ups from C++ withdrawal.
Session III (7:30 - 8:00 PM) - Continuuity Loom: Modern Cluster Management
Continuuity Loom is open source cluster management software that provisions, manages, and scales clusters on public clouds and private clouds. Clusters created with Loom utilize templates of any hardware or software stack, from simple standalone LAMP-stack servers and traditional application servers like JBoss to secure Apache® Hadoop™ clusters comprised of thousand of nodes. Get more information at and register to learn more about Continuuity Loom in the Cloud at try
Speakers: Nitin Motgi, Co-Founder and VP of Engineering, Continuuity
Nitin is the Co-founder of Continuuity and an active investor. Prior to Continuuity, Nitin spent 5 years at Yahoo!, where he was an engineering lead working on large scale content optimization and personalization engine known externally as C.O.R.E. He was one of the founding members of C.O.R.E at Yahoo. He pioneered use of HBase in production at Yahoo! and was running one of the biggest HBase clusters in the world at the time.
He holds Masters in Computer Science from University of Central Florida and Bachelors in Computer Science from Karnataka University

Yahoo Campus Map:
Detail map (
Location on Wikimapia: