Deep learning architecture using Tachyon and Spark & Tachyon New Features

Alluxio Bay Area Meetup
Alluxio Bay Area Meetup
Public group

Location visible to members


This Tachyon Meetup ( features a chance to interact with other Tachyon ( users and the developers, as well as three talks.

a) James Peng and Weide Zhang from Baidu ( will share updates regarding their Tachyon deployments in production.

b) Christopher Nguyen, Vu Pham, and Michael (Bach) Bui from Adatao ( will share their Deep Learning experience using Tachyon.

c) Calvin Jia and Jiri Simsa from Tachyon Nexus ( will present and demo features in Tachyon 0.7 and 0.8 releases.

Food will be available starting at 6:00 PM, presentations will begin at 6:30PM. Special thanks to Baidu for hosting this.

Tachyon at Baidu by Dr. James Peng and Weide Zhang

Abstract: Dr. James Peng will first give a short update on how Baidu uses Tachyon in its infrastructure, and then Weide Zhang will give a short presentation on the time-to-live (TTL) feature Baidu USA recently developed and contributed back to the community. In summary, the TTL feature: 1) Allows tachyon client to optionally set time to expiration value (TTL in seconds) associated with created files in tachyon. 2) Tachyon master will periodically purge files associated with expired TTL from both Tachyon and UnderFS

Dr. James Peng is a Principal Architect at Baidu, where he steers the engineering direction for several divisions, including monetization platforms, infrastructure department, and data science and big data platform. The projects that he initiated and led have made significant contributions to a wide range of core products. The ads budget-control project that he led won Baidu’s prestigious Highest Award in 2013. Before joining Baidu, James was at Google Mountain View engineering team, where he has worked on various projects in the AdWords system. Prior to Google, he was a Research Associate at Stanford University, where his research was focused on distributed computing, data modeling, and large-scale databases. James holds a B.S. degree from Tsinghua University, a M.S. degree from Stats University of New York at Buffalo, and a Ph.D. degree from Stanford University.
Weide Zhang is a Senior Architect at Baidu Inc working on big data infrastructure. Before Baidu, he had been working in various areas of system development in distributed serving systems, search infrastructures as well as machine learning in the past 7 years. He has Masters degree in Computer Science and System Engineering from University of Virginia and Bachelors in Computer Science and Applied Mathematics from Shanghai Jiao Tong University.

First-ever scalable, distributed deep learning architecture using Tachyon & Spark by Christopher Nguyen, Vu Pham, and Michael (Bach) Bui

Abstract: Deep learning algorithms have been widely used in many real-world applications, including computer vision, machine translation, and fraud detection. Unfortunately, deep learning only works best when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data, and have been used extensively. By having deep learning available on Spark, businesses can fully take advantage of deep learning capabilities on their datasets using their existing Spark infrastructure.

In this talk, we present a scalable implementation of predictive deep learning algorithms on Spark and Tachyon, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This, to our best knowledge, is the first successful implementation of CNNs and RNNs on Spark and Tachyon. To support big model training, we use Tachyon as common storage layers between the Spark workers. With its in-memory distributed execution model, Tachyon provides a scalable approach even when the model is too big to be handled on a single machine. Our solution also exploits graphical processing units (GPUs) for matrix computation whenever they are available on worker nodes, further improving execution time.

The attendees will learn about deep learning models, the architecture of the system, and how to train and run deep learning models on Tachyon and Spark.

Christopher Nguyen is CEO and co-founder of Adatao, the leader in enterprise big apps. Previously, he served as engineering director of Google Apps and co-founded two successful startups. As a professor, he co-founded the Computer Engineering program at HKUST. He earned his B.S. degree from the University of California-Berkeley summa cum laude, and a Ph.D. from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1. He is a co-creator of the open-source Distributed DataFrame project (
Vu Pham is a machine learning software engineer at Adatao, with focus in deep learning. He helps build Adatao’s deep learning solutions. He is an avid contributor to various open-source projects such as cubgs, Deepnet, and deeplearning4j. Prior to Adatao, he worked in academia and industry, and authored and co-authored several scientific papers.
Michael (Bach) Bui is a co-founder and engineering lead of Adatao. Prior, he worked on Hadoop 2.0 at Yahoo!, having completed his PhD in CS from the University of Illinois, Urbana-Champagne, where his focussed on real-time distributed systems engineering. Michael was a lead developer of Adatao's PredictiveEngine, and has contributed to the early development of Apache Spark.

What's new in Tachyon by Calvin Jia and Jiri Simsa

The Tachyon Project has been rapidly evolving in the past few months. We had a major release, version 0.7, in July and are planning to release version 0.8 in October. These releases include numerous new features and greatly improve the ease of deploying and managing Tachyon. In particular, the latest version features remote write, allowing users to write to Tachyon through remote workers, detailed monitoring of the master and workers, and integration with Yarn and Mesos. Further, the latest version contains significant user facing improvements, including one-click cluster deployment, mounting of multiple under storage systems, and transparent naming.

In the presentation, we will explore several potential industry use cases enabled by the new features. One-click cluster deployment enables users to experiment and prototype with Tachyon on AWS, launching not only Tachyon but also the computation framework and storage system of their choice. Mounting of multiple under storage systems and transparent naming enables more exciting use cases for Tachyon users.

Calvin Jia is a software engineer at Tachyon Nexus. He has been working on the Tachyon project since the beginning and is a top contributor to the project. Prior to Tachyon Nexus, he worked in the Data Infrastructure team at BrightRoll (acquired by Yahoo). Calvin earned his B.S. from the University of California, Berkeley in Electrical Engineering and Computer Science.
Jiri Simsa is a software engineer at Tachyon Nexus. He has received a computer science Ph.D. from Carnegie Mellon University for his work on systematic and scalable testing of concurrent systems. Prior to Tachyon Nexus, he was at Google, where he lead the design and implementation of build and test infrastructure for a team working on a distributed applications framework.