Next meetup will be on August 19, 2015 at Cask Data HQ, Palo Alto.
Mark your calendar and hope to see you all!
Topics and Speakers
• Using Apache Kylin for large scale data analytics at eBay: Realtime Cube Updates with Kylin/Kafka Integration - Seshu Adunuthula, Director of Analytics Platform at eBay and Branky Shao, Software Engineer at eBay
Apache Kylin is an open source Distributed Analytics Engine contributed from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop with support for extremely large data sets.
Kylin has traditionally supported end of the day processing of the cubes resulting in large multi-hour cube build times depending on the number of rows added and
In this talk we will introduce the concept of "Cube
Segments” the ability to build cubes on micro batches of data subscribed from Kakfa Topics.
We will also present an internal usecase where SEO Attribution report with a 24+ hour processing window is now available within minutes.
• High Volume Streaming Analytics with CDAP - Jialong Wu from Lotame
In this talk, we’ll present the design of our new data stream processing application at Lotame and describe how we achieve significant reduction in cluster resource utilization while allowing faster updates of client audience data and better ad-hoc query support with the new platform.
We will examine the challenges faced in counting uniques in a high volume stream processing environment, and present a novel approach using time windowed HyperLogLog aggregates. We’ll also discuss how CDAP enable us to roll out this new platform quickly and share some valuable lessons and best practices we learned during the development cycle.
• Introducing Athena - Yuanchi Ning from Uber
Athena is a stream processing platform for Uber's near real time analytics applications, built using Samza. We will be discussing some of the existing and upcoming use cases and how they impact the Uber partners / riders. The talk will go through the tooling built around Samza for easier user on-boarding - such as deployment manager, integration with typesafe config system, unit test framework, Graphite integration, metric whitelisting and so on. We'll also go over some of the issues observed during this process.
• Seshu Adunuthula is Director of Analytics Platform at eBay responsible for managing some of the world¹s largest deployments of Hadoop, Teradata and ETL Ingest infrastructure. He is an industry veteran with over 20 years of Distributed Computing and Analytics Experience. Most recently he was managing the San Jose Development Team of MapR responsible for MapReduce, MapR-DB and MapR Control System Teams. Prior to that he was with Microsoft and Oracle in individual contributor and managerial roles in Microsoft SQL Server BI and at Oracle BPEL Workflow teams.
• Jialong Wu is a big data architect at Lotame, where he works on the core data platform that provides valuable insights to client's audience data and links users across mobile and desktop devices.
• Yuanchi Ning is a Software Engineer in Data Engineering (Streaming Platform) at Uber Technologies. She is working on near-realtime data analytics services and application on top of Samza and is also building a generic streaming platform with Samza called Athena. This platform is built to empower other teams at Uber to develop reliable, scalable, and high-performing stream processing applications. Yuanchi received her Master's degree from Carnegie Mellon University.
Food & socializing 6pm-6:30pm