This meetup (www.meetup.com/Alluxio) features a chance to interact with Alluxio (www.alluxio.io/) users and developers, as well as other open source tech enthusiasts and data wranglers. Thanks for Data Council (https://www.meetup.com/DataCouncil-AI-NYC-Data-Engineering-Science/) and Meetup.com for jointly hosting this event!
6:00pm: Happy Hour and networking
6:30pm: 1st talk - Financial services case study - embracing hybrid cloud for data-intensive analytic workloads
7:00pm: 2nd talk - Alluxio on AWS EMR: Fast storage access and sharing for Spark
7:30pm: Q&A & Mingle
Talk 1: Hedge fund case study - Embracing hybrid cloud for data-intensive analytic workloads
The most innovative organizations like Uber, Twitter, and others have moved to disaggregated stacks - a separate tier for computational frameworks like Spark and Presto and a separate tier for Storage. And the need for more compute flexibility is making users move towards hybrid clouds.
In this meetup, Dipti and HY will present a new approach to hybrid analytical workloads using Alluxio, an open source data orchestration layer, which sits between compute and storage layer. Applications like Apache Spark or TensorFlow can then seamlessly access multiple disparate data sources with consistent performance using data locality and abstraction that the data orchestration tier brings.
We’ll also present a sneak peak of Alluxio 2.0, the next major release of the project!
Haoyuan Li (H.Y.), Alluxio
Haoyuan is the Founder and CTO of Alluxio. He graduated with a Computer Science Ph.D. from the AMPLab at UC Berkeley. At the AMPLab, he co-created and led Alluxio (formerly Tachyon), an open source virtual distributed file system. Before UC Berkeley, he got a M.S. from Cornell University and a B.S. from Peking University, all in Computer Science.
Dipti Borkar, Alluxio
Dipti Borkar is the VP of Product & Marketing at Alluxio with over 15 years experience in data and database technology across relational and non-relational. Prior to Alluxio, Dipti was VP of Product Marketing at Kinetica and Couchbase. Dipti holds a M.S. in Computer Science from the UC San Diego, and an MBA from the Haas School of Business at UC Berkeley.
Talk 2: Alluxio on AWS EMR: Fast storage access and sharing for Spark
With data processing tools and storage solutions evolving, we have data stored in different places and compute with different processing engines. Alluxio is a solution to simplify access to different storage and accelerate data compute processing. This talk is an extended version of my original blog post (https://www.alluxio.io/blog/alluxio-on-emr-fast-storage-access-and-sharing-for-spark-jobs/), we will discuss requirements and what use cases Alluxio can be a great fit, then we will talk about running Alluxio on AWS EMR (5.23) and the deployment process. I will run a live demo on how to enable Alluxio with Spark on AWS EMR.
Chengzhi Zhao, Sr. Data Platform Engineer, Meetup.com
Chengzhi Zhao is a Sr. Data Platform Engineer at Meetup. He mainly works on building scalable, reliable and maintainable data pipeline infrastructure to enable machine learning engineers and data scientists build data products.