Skip to content

42nd Bay Area Hadoop User Group (HUG) Monthly Meetup

Photo of Yahoo! HUG Organizer
Hosted By
Yahoo! HUG O.
42nd Bay Area Hadoop User Group (HUG) Monthly Meetup

Details

Agenda

6:00 - 6:30 - Socialize over food and beer(s) 6:30 - 7:00 - Hadoop and Spark Join Forces at Yahoo! 7:00 - 7:30 - Inside Hunk: Architecture, Analytics and Use Cases 7:30 - 8:00 - Turning the Tables with InfiniDB for Hadoop

Session I (6:30 - 7:00 PM) - Hadoop and Spark Join Forces at Yahoo!

Hadoop plays a central role for Yahoo! to provide personalized experiences for our users and create value for our advertisers. To address our emerging business needs, Yahoo is expanding our big-data platform to empower Spark applications seamlessly integrated with Hadoop. In this talk, we will explain how Spark is applied in various Yahoo use cases, and present our strategy for Spark adoption.

Yahoo has enjoyed our collaboration with the community on Spark enhancements. We have been a leading player on Spark-on-YARN, which allows Spark applications to be deployed on Hadoop NextGen (YARN). In this talk, we will provide a high-level overview of Spark-on-YARN, share customer stories on Spark-on-YARN adoption, and outline areas for future enhancement.

Speaker:Andy Feng (Distinguished Architect, Cloud Services, Yahoo)

Bio:

Andy Feng is a Distinguished Architect at Yahoo!. He is currently leading the architectural design of next-gen big-data platform. Prior to Yahoo!, Andy served as Chief Architect at Netscape/AOL and as Principal Scientist at Xerox.

Session II (7:00 - 7:30 PM) - Inside Hunk: Architecture, Analytics and Use Cases

Join the two speakers from Splunk to learn about Hunk and how it was built. The talk will cover the architecture, the challenges and the solutions. You will learn about how the team was able to achieve interactivity on top of HDFS with true schema-on-read. The talk will explain how Hunk uses MapReduce as an orchestration framework and preview results using a subset of the day while the MapReduce job kicks off, followed by Hunk handling the Reduce phase to deliver streaming results, allowing you to stop, pause or refine queries on the fly.

Speaker:Todd Papaioannou (CTO, Splunk)

Bio:

Todd Papaioannou has served as Splunk's Chief Technology Officer since 2013. Prior to joining us, Mr. Papaioannou was an Entrepreneur-in-Residence at Data Collective Venture Capital. From 2011 to 2013, he served as Chief Executive Officer and Co-Founder of Continuuity, Inc., a software company. Previously, Mr. Papaioannou worked as Vice President, Chief Cloud Architect at Yahoo! Inc. from 2010 to 2011, and Vice President, Architecture and Emerging Technologies at Teradata Corporation from 2005 to 2010. Mr. Papaioannou holds a Ph.D. in artificial intelligence and distributed systems from Loughborough University in England.

Speaker:Brett Sheppard (Director of Big Data Product Marketing, Splunk)

Bio:

Brett Sheppard is director of big data product marketing at Splunk Inc. Mr. Sheppard's career combines roles as a senior analyst at Gartner and strategic marketing director at high-growth technology companies. He is a certified Hadoop system administrator. Based six years overseas, in Europe and Asia, Mr. Sheppard has worked onsite across the U.S. and in 35+ countries to help enterprises and public sector accounts evolve their data architectures to manage and benefit from big data. Mr. Sheppard holds a B.A. from University of Virginia and a M.A. from University of Pittsburgh.

Session III (7:30 - 8:00 PM) - Turning the Tables with InfiniDB for Hadoop

Learn about the InfiniDB for Hadoop technology and it’s place in the Big Data ecosystem.

InfiniDB offers an Open Source analytics engine that runs as a non-map/reduce engine on an existing HDP or CDH distribution, reading and writing to HDFS. Engineered to deliver extremely high performance for dimensional analysis and analytic queries at scale, InfiniDB partitions data internally both vertically (by columns) and horizontally by range of rows. The custom distribution of work is written in C++ to allow distribution of primitive operations across all available cores in a Hadoop cluster.

However, the technology is tuned specifically for analytic reporting, and has trade-offs for other workloads. We will discuss appropriate workloads for InfiniDB for Hadoop and related columnar technologies including Parquet.

Speaker:Jim Tommaney (CTO, Calpont)

Bio:

Jim Tommaney is the CTO at Calpont. Jim has been involved in data and analytics since the relational middle ages (Oracle 6), but now focuses on distributed technologies to deliver interactive SQL at scale.

Yahoo Campus Map:

Detail map (http://photos4.meetupstatic.com/photos/event/2/8/e/d/600_21370477.jpeg)

Location on Wikimapia:

http://www.wikimapia.org/#lat=37.4181633&lon=-122.0250607&z=18&l=0&m=b&search=yahoo

Photo of Bay Area Hadoop Meetup group
Bay Area Hadoop Meetup
See more events