Past Meetup

User Stories: Alluxio production use cases with Presto and Hive (NYC)

This Meetup is past

45 people went


110 5th Ave · New York, NY

How to find us

The event is on the 5th floor. Please check in at the front-desk.

Location image of event venue


Alluxio meetup is happening for the first time in New York city! Special thanks to Work-Bench for hosting! This event is free but please RSVP.

This meetup will feature talks by Haoyuan and Bin from Alluxio, Tao and Bing from and Thai from Bazaarvoice.

6:00-6:30pm - Happy Hour and networking
6:30pm - Intro from Work-Bench
6:40pm - Alluxio overview and new features
7:10pm -’s use case: Using Alluxio as a fault-tolerant pluggable optimization component of's compute frameworks
7:40pm - Bazaarvoice's use case: Hybrid collaborative tiered-storage with Alluxio

Food and drinks will be available starting at 6pm, and presentations will begin at 6:30pm.

Title: Alluxio: An Overview and What's New in 1.8

Alluxio is a memory-speed virtual distributed file system that provides big-data analytics stack a unified data access layer. Alluxio as this new layer enables compute frameworks like Spark, Presto, MapReduce, Hive and etc to transparently access different persistence storage system while actively leveraging memory to accelerate data access. As a result, Alluxio helps simplify the development and management of big data and machine learning workloads with lower cost and better performance. Alluxio originated from “Tachyon”, a research project of AMPLab at UC Berkeley. Currently, the project has more than 800 contributors from more than 100 companies or organizations worldwide.
In this talk, Haoyuan and Bin will give an overview of Alluxio in its basic concepts, architecture, data flow and how to interact with other components of the ecosystem. They will also share production use cases. Then they will cover the new features in the latest 1.8 release and our roadmap for future versions.

Haoyuan Li is the creator and founder of Alluxio. Prior to founding the company, Haoyuan was working on his PhD at UC Berkeley’s AMPLab, where he cocreated Alluxio. He is also a founding committer of Apache Spark. Previously, he worked at Conviva and Google. Haoyuan holds a Ph.D. from UC Berkeley.
Bin Fan is the founding member of Alluxio Inc and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure and won Google's Technical Infrastructure award. Bin got his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems and algorithms.

Title: Using Alluxio as a fault-tolerant pluggable optimization component of's compute frameworks

Abstract: is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently,’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster’s nodes and a total capacity of 210 PB. Alluxio has run in’s production environment on 100 nodes for six months. Tao and Bing will explain how uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component achieving 10x performance improvement on average with JDPresto. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.

Tao Huang is a big data platform development engineer at, where he is mainly engaged in the development and maintenance of the company’s big data platform, using open source projects such as Hadoop, Spark, Alluxio, and Kubernetes. He focuses on migrating Hadoop to the Kubernetes cluster, which will run long-running services and batch jobs, to improve the cluster resource utilization.
Bing Bai, senior big data platform development engineer at Focused on computation and storage frameworks, such as Spark, Hive, Presto, Alluxio, HDFS etc. He has rich experience in architecture designing and developing for applying the frameworks into production with large-scale clusters.