Two Sigma Open Source Meetup


We’re hosting the quarterly Two Sigma Open Source meetup on Monday, 3/25! TSOS meetups focus on the open source projects that Two Sigma cares most about, from projects we generated in-house then open sourced to large external open source projects that we depend on to do our work. This time, Wenbo Zhao (Two Sigma) and Bin Fan (Alluxio) will be presenting on how Two Sigma uses Alluxio to make data-intensive compute independent of the storage beneath.

We’re expecting about 45 minutes of talks total, to be followed by chatting and pizza!


Doors open at 5:30pm; the presentations will begin around 6.

You'll have to check-in with security with your Name/ID. Definitely sign-up (with your first and last name) if you’re going to attend–unfortunately people whose names aren’t entered into the security system in advance won’t be allowed in. If you get wait-listed here on the meetup site, it's still worth showing up -- we've rarely if ever had to turn someone away.

Achieving compute and storage independence for data-driven workloads (Bin Fan - Alluxio, and Wenbo Zhao - Two Sigma)

The rise of computation-intensive workloads and the adoption of the cloud has driven organizations to adopt a decoupled architecture for modern workloads -- one in which compute scales independently from storage. However, while enabling scaling elasticity, it introduces new problems -- how do you co-locate data with compute, how do you unify data across multiple remote data centers, how do you keep storage and I/O egress costs down and many more.

In this meetup, the Wenbo and Bin will present a new approach to making data-intensive compute independent of the storage beneath using open source project Alluxio, an open-source distributed file system, which sits between compute and storage layer. Applications like Apache Spark or TensorFlow can then seamlessly access multiple disparate data sources with consistent performance without knowing where the data is actually persisted.

Wenbo will present why Two Sigma needed to disaggregate compute and storage and how they decided to adopt the Spark + Alluxio + HDFS architecture.

Bin will present a deep dive into the Alluxio open source project, a distributed file system, including the reference architecture, the data & metadata paths to serve requests from compute from remote understores and the compute API’s supported for accessing data from Alluxio.