Fast big data analytics with Spark on Tachyon in Baidu


Details
This Tachyon Meetup (https://www.meetup.com/Tachyon) features a chance to interact with other Tachyon (http://tachyon-project.org/) users and the developers, as well as two presentations:
a) Shaoshan Liu from Baidu (http://baidu.com/) will share lessons they learned from Tachyon deployments in production.
b) Haoyuan Li from Tachyon Nexus (http://www.tachyonnexus.com/) will share exciting news related to Tachyon.
Food will be available starting at 6:00 PM, presentations will begin at 6:30PM. Special thanks to IBM for hosting this.
Fast big data analytics with Spark on Tachyon in Baidu
Abstract:
In this talk we will focus on how Tachyon can help improve big data analytics (ad-hoc query) efficiency within Baidu. In detail, we will explain:
Currently within Baidu, we have a production Tachyon cluster with 100 nodes and over 2 PB of storage space, this cluster mainly serves as the cache layer for our Big Data Analytics engine. In this talk, first we introduce the Big Data Analytic infrastructure within Baidu. Then, we explain why we started using Tachyon a few months ago, as well as the problems encountered when we started using Tachyon. Next, we delve into the details of how Tachyon help accelerate our Big Data Analytics pipeline at its current state. At the end, we discuss what new features we want to see and the plan to scale further.
Bio:
Shaoshan Liu is currently a Senior Architect at Baidu U.S.A. working on Big Data Infrastructure. Before Baidu, he worked at Linkedin and Microsoft. Shaoshan has a Ph.D. from UC Irvine.
Tachyon Recent Development
Abstract:
Tachyon 0.6.4 was recently released, with significant improvements to the overall system. This talk will touch on the features recently developed on Tachyon, including the alpha release of Tiered Storage in 0.6.4.
Bio:
Haoyuan Li is founder and CEO of Tachyon Nexus (http://www.tachyonnexus.com/). He is also a Computer Science Ph.D. candidate in AMPLab at UC Berkeley, where he co-created Tachyon, an open source memory-centric distributed storage system. He is also a founding committer of Apache Spark.
Agenda
6:00 – 6:30 Food & Networking
6:30 – 7:30 Talks
7:30 – 7:45 Q&A
7:45 – 8:00 Wind down
Tweet about this! https://twitter.com/TachyonProject/status/598234199305228288
Tachyon Survey: http://goo.gl/forms/dTwa9pRhqB

Fast big data analytics with Spark on Tachyon in Baidu