Skip to content

Fast Big Data Analytics with Spark on Tachyon in Baidu

Photo of IBM Big Data
Hosted By
IBM Big D.
Fast Big Data Analytics with Spark on Tachyon in Baidu

Details

Join us on May 28th for a chance to interact with IBM and Tachyon (http://tachyon-project.org/) users and the developers to hear how Tachyon can help improve big data analytics (ad-hoc query) efficiency within Baidu.

Currently within Baidu, we have a production Tachyon cluster with 100 nodes and over 2 PB of storage space, this cluster mainly serves as the cache layer for our Big Data Analytics engine. In this talk, first we introduce the Big Data Analytic infrastructure within Baidu. Then, we explain why we started using Tachyon a few months ago, as well as the problems encountered when we started using Tachyon. Next, we delve into the details of how Tachyon helps accelerate our Big Data Analytics pipeline at its current state. At the end, we discuss what new features we want to see and the plan to scale further. Hear from Baidu (http://www.baidu.com/) who will share lessons learned from Tachyon deployments in production as well as Tachyon Nexus (http://www.tachyonnexus.com/) who will share exciting news related to Tachyon.

Tachyon recent development: Tachyon 0.6.4 was recently released, with significant improvements to the overall system. This talk will touch on the features recently developed on Tachyon, including the alpha release of Tiered Storage in 0.64. Haoyuan Li is founder and CEO of Tachyon Nexus. He is also a Computer Science Ph.D. candidate in AMPLab at UC Berkeley, where he co-created Tachyon, an open memory-centric distributed system. He is a founding committer of Apache Spark.

Agenda:

06:00-06:30 - Food and Networking
06:30-07:30 - Presentations
07:30-07:45 - Q&A
07:45-08:00 - Thank you for attending

See you there!

Photo of Data, Cloud and AI in Silicon Valley group
Data, Cloud and AI in Silicon Valley
See more events