• Online Office Hour: Running Presto with Alluxio on Amazon EMR

    Starting 2019 we have launched monthly community office hours online, which will be hosted by PMC Maintainers and top contributors to the Alluxio open source project. If you are interested in presenting or hosting a session please contact [masked]. To join the office hour, RSVP: https://go.alluxio.org/oh-presto-alluxio-emr Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic. In this Office Hour I'll go over: -How to set up Alluxio with the EMR stack so that Presto jobs can seamlessly read from and write to S3 -Compare the performance between Presto on EMR with Presto and Alluxio on EMR -Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

    1
  • Online Office Hour: Enabling Apache Spark for Hybrid Cloud | HDFS & AWS S3 demo

    Starting 2019 we have launched monthly community office hours online, which will be hosted by PMC Maintainers and top contributors to the Alluxio open source project. If you are interested in presenting or hosting a session please contact [masked]. To join the office hour, RSVP: https://go.alluxio.org/oh-enabling-spark-hybrid-cloud Use the following link to join the google hangout: meet.google.com/ucy-rmyx-dpy Dial-in: ‎(‪US‬) ‪[masked]‬ PIN: ‪[masked]#‬ For April the topic is Enabling Apache Spark for Hybrid Cloud | HDFS and AWS S3 demo: Alluxio can help data scientists and data engineers interact with different storage systems in a hybrid cloud environment. Using Alluxio as a data access layer for Big Data and Machine Learning applications, data processing pipelines can improve efficiency without explicit data ETL steps and the resulting data duplication across storage systems. In this Office Hour you'll learn: -How to set up Alluxio so that applications can seamlessly read from and write to different storage systems (including cloud storage like AWS S3, Azure Blob Store and on-prem storage like HDFS) -How to analyze data access pattern and also manage data lifecycle in Alluxio using Alluxio web UI and shell commands -Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

    1
  • Online Office Hour: Running ML Workloads with Tensorflow + Alluxio + AWS S3

    Starting 2019 we have launched monthly community office hours online, which will be hosted by PMC Maintainers and top contributors to the Alluxio open source project. If you are interested in presenting or hosting a session please contact [masked]. To join the office hour, RSVP and you will receive a confirmation email with the Google Hangout link to join virtually: https://go.alluxio.org/OH-Running-Tensorflow-Alluxio-S3 For March, the topic is Running Machine Learning Workloads with Tensorflow + Alluxio + AWS S3 The Alluxio POSIX API enables data engineers to access any distributed file system or cloud storage as if accessing a local file system with an added performance improvement. This reduces the effort and complexity for data engineers to run their machine learning or legacy workloads on new data storage without data migration or data duplication. In this Office Hour you'll learn about: -How to install and setup the Alluxio POSIX API to enable data access to disparate storage systems, including AWS S3 -Tensorflow model training using Alluxio POSIX API to read data from S3 -Open Session for discussion on any topics such as solving the separation of compute and storage problem, unifying multiple storage systems, and more

    1
  • Near Real-time Big Data Platform w/ Spark & Alluxio - Vipshop eCommerce Use Case

    Agenda: 6:30pm - Happy Hour and networking 7:00pm - Alluxio 2.0 Preview Release Deep Dive - Calvin Jia 7:30pm - Real-time Data Processing for Sales Attribution Analysis with Alluxio, Spark and Hive at VIPShop - Wanchun Wang - Chief Architect 7:45pm - Q&A Event partner: AICamp Talk 1: Title: Alluxio 2.0 Preview Release Deep Dive Abstract: We are excited to present Alluxio 2.0 to our community. The goal of Alluxio 2.0 was to significantly enhance data accessibility with improved APIs, expand use cases supported to include active workloads as well as better metadata management and availability to support hyperscale deployments. Alluxio 2.0 Preview Release is the first major milestone on this path to Alluxio 2.0 and includes many new features. In this talk, I will give an overview of the motivations and design decisions behind the major changes in the Alluxio 2.0 release. We will touch on the key features: - New off-Heap metadata storage leveraging embedded RocksDB to scale up Alluxio to handle a billion files; - Improved Alluxio POSIX API to support legacy and machine-learning workloads; - A fully contained, distributed embedded journal system based on RAFT consensus algorithm in high availability mode; - A lightweight distributed compute framework called “Alluxio Job Service” to support Alluxio operations such as active replication, and async-persist, cross mount move/copy and distributed loading; - Support for mounting and connecting to any number of HDFS clusters of different versions at the same time; Active file system sync between Alluxio and HDFS as under storage. Bio: Calvin Jia is the top contributor of the Alluxio project. He has been involved as a core maintainer and release manager since the early days when the project was known as Tachyon. Calvin has a B.S. from the University of California, Berkeley. Talk2: Title: Real-time Data Processing for Sales Attribution Analysis with Alluxio, Spark and Hive at VIPShop Abstract: Vipshop is a leading eCommerce company in China with over 15 million active daily users. Our ETL jobs primarily run against data on HDFS, which can no longer meet the increasing swiftness and stability demand for certain real-time jobs. In this talk, I will explain how we’ve replaced HDFS with Memory+ HDD managed by Alluxio to speed up data accesses for all our Sales Attribution applications running on Spark and Hive, this system has been in production for more than 2 years. As more old fashion ETL SQLs are being converted into real-time jobs, leveraging Alluxio for caching has become one of the widely considered performance tuning solution. I will share our criteria when selecting use cases that can effectively get a boost by switching to Alluxio. Our future work includes using Alluxio as an abstraction layer for the \tmp\ directory in our main Hadoop clusters, and we are also considering Alluxio to cache the hot data in our 600+ node Presto clusters. Bio: Wanchun Wang is the Chief Architect and has been with VIPShop for over 5 years and his interests focus on processing large amounts of data such as building streaming pipelines, optimizing ETL applications, and designing in-house ML & DL platforms. He is currently managing big data teams that are responsible for batch, real-time, and data warehouse systems. Acknowledgment: Our event partner AICamp (http://www.xnextcon.com) is a global online platform for engineers, data scientists to learn and practice AI, ML, DL, Data Science, with 80000+ developers, and 40+ cities local study groups around the world.

    2
  • Online Office Hour: Running Spark with Alluxio for Fast Data Analytics

    Starting 2019 we have launched monthly community office hours online, which will be hosted by PMC Maintainers and top contributors to the Alluxio open source project. If you are interested in presenting or hosting a session please contact [masked]. To join the office hour, RSVP and you will receive a confirmation email with the Google Hangout link to join virtually: https://go.alluxio.org/OH-Running-Apache-Spark-With-Alluxio For February, the topic is Running Apache Spark with Alluxio for Fast Data Analytics. We will go over the following topics: - Using Alluxio as the input/output for Spark applications - Saving and loading Spark RDDs and Dataframes with Alluxio - Open Session for discussion on any topics such as solving the separation of compute and storage problem, unifying multiple storage systems, and more Please use the following link to join this virtual event: Join Hangouts Meet meet.google.com/ucy-rmyx-dpy Join by phone ‪[masked]‬ PIN: ‪[masked]‬#

    4
  • Alluxio+Presto: An Architecture for Fast SQL in the Cloud

    Presto, an open source distributed SQL engine, is being adopted widely for its high concurrency and native ability to query multiple data sources. But as the adoption of cloud object stores like S3 grows and data-powered application demands increase, engineers are looking for even more acceleration and a high-performance architecture. Kamil from Starburst, and Andrew and Bin of Alluxio, will present how best to leverage Alluxio and Presto for fast SQL in the cloud. You will also learn about real-world use cases at JD.com and NetEase.com. Agenda: 6:00pm: Happy Hour and networking 6:30pm: Part I - Presto: Fast SQL-on-Anything by Starburst (Kamil Bajda-Pawlikowski) 6:50pm: Part II - Alluxio Overview (Andrew Audibert) 7:10pm: Part III - Presto + Alluxio + Object Store: Architecture, Use Case (Bin Fan) 7:30pm: Q&A Part I: Presto: Fast SQL-on-Anything Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores. We will cover the architecture of Presto, its separation of compute and storage, and cloud-readiness. In addition, we will discuss some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as the roadmap going forward. Part II: Alluxio Overview Alluxio is an open-source distributed file system that provides data ecosystems a unified data access layer at in-memory speed. Alluxio enables compute engines like Spark, Presto, MapReduce, TensorFlow to transparently access different persistent storage systems (including HDFS, S3) while actively leveraging in-memory cache to accelerate data access. As a result, Alluxio simplifies the development and management of big data and ML workloads with lower cost and better performance. Alluxio has more than 900 contributors and is used by over 100 companies worldwide. Andrew will give an overview of Alluxio’s core concepts, architecture, data flow, and production use cases. Part III: Presto + Alluxio + Object Store: Architecture and Use Case Cloud object storage systems provide different semantics and performance implications compared to HDFS. Applications like Presto cannot benefit from the node-level locality or cross-job caching when reading from the cloud. Deploying Alluxio with Presto to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly. Bin will present the architecture to combine Presto with Alluxio with use cases from major internet companies like JD.com and NetEase.com, and their lessons learned to operate this architecture at scale. Bios: Kamil Bajda-Pawlikowski is a technology leader in the large scale data warehousing and analytics space. He is CTO of Starburst, the enterprise Presto company. Prior to co-founding Starburst, Kamil was the Chief Architect at the Teradata Center for Hadoop in Boston, focusing on the open source SQL engine Presto. Previously, he was the co-founder and chief software architect of Hadapt, the first SQL-on-Hadoop company, acquired by Teradata in 2014. Andrew Audibert is an early member of Alluxio, and a top contributor to the Alluxio project. He has been a core maintainer since early 2016. Prior to Alluxio, he worked for Palantir Technologies. Andrew has a B.S. from CMU. Bin Fan is the founding member of Alluxio, Inc. and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google. Bin received his Ph.D. in Computer Science from CMU working on distributed systems.

    1
  • User Stories: Alluxio production use cases with Presto and Hive (NYC)

    Alluxio meetup is happening for the first time in New York city! Special thanks to Work-Bench for hosting! This event is free but please RSVP. This meetup will feature talks by Haoyuan and Bin from Alluxio, Tao and Bing from JD.com and Thai from Bazaarvoice. Agenda: 6:00-6:30pm - Happy Hour and networking 6:30pm - Intro from Work-Bench 6:40pm - Alluxio overview and new features 7:10pm - JD.com’s use case: Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks 7:40pm - Bazaarvoice's use case: Hybrid collaborative tiered-storage with Alluxio Food and drinks will be available starting at 6pm, and presentations will begin at 6:30pm. Title: Alluxio: An Overview and What's New in 1.8 Abstract Alluxio is a memory-speed virtual distributed file system that provides big-data analytics stack a unified data access layer. Alluxio as this new layer enables compute frameworks like Spark, Presto, MapReduce, Hive and etc to transparently access different persistence storage system while actively leveraging memory to accelerate data access. As a result, Alluxio helps simplify the development and management of big data and machine learning workloads with lower cost and better performance. Alluxio originated from “Tachyon”, a research project of AMPLab at UC Berkeley. Currently, the project has more than 800 contributors from more than 100 companies or organizations worldwide. In this talk, Haoyuan and Bin will give an overview of Alluxio in its basic concepts, architecture, data flow and how to interact with other components of the ecosystem. They will also share production use cases. Then they will cover the new features in the latest 1.8 release and our roadmap for future versions. Bio: Haoyuan Li is the creator and founder of Alluxio. Prior to founding the company, Haoyuan was working on his PhD at UC Berkeley’s AMPLab, where he cocreated Alluxio. He is also a founding committer of Apache Spark. Previously, he worked at Conviva and Google. Haoyuan holds a Ph.D. from UC Berkeley. Bin Fan is the founding member of Alluxio Inc and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure and won Google's Technical Infrastructure award. Bin got his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems and algorithms. Title: Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks Abstract: JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster’s nodes and a total capacity of 210 PB. Alluxio has run in JD.com’s production environment on 100 nodes for six months. Tao and Bing will explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component achieving 10x performance improvement on average with JDPresto. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency. Bio: Tao Huang is a big data platform development engineer at JD.com, where he is mainly engaged in the development and maintenance of the company’s big data platform, using open source projects such as Hadoop, Spark, Alluxio, and Kubernetes. He focuses on migrating Hadoop to the Kubernetes cluster, which will run long-running services and batch jobs, to improve the cluster resource utilization. Bing Bai, senior big data platform development engineer at JD.com. Focused on computation and storage frameworks, such as Spark, Hive, Presto, Alluxio, HDFS etc. He has rich experience in architecture designing and developing for applying the frameworks into production with large-scale clusters.

    4
  • Guardant Health: Fast, scalable, data processing with Alluxio, Mesos, and Minio

    Alluxio Meetup features a chance to interact with other Alluxio (http://www.alluxio.com/) users and developers, along with two presentations. Registration is required. Adit Madan from Alluxio and Jörg Schad from Mesosphere will co-present the Alluxio and Mesosphere joint solution Omar from Guardant Health will be sharing their experience leveraging the Alluxio - Mesosphere joint solution to gain faster insights from mobile data. Agenda: 6:30 - 7:00: Happy Hour & Networking 7:00 - 7:30: First talk + Q&A 7:30 - 8:00: Second talk + Q&A 8:00 - 8:30: Open Q&A & Networking Food and drinks will be available starting at 6:30pm, presentations will begin at 7:00pm. Special thanks to Mesosphere for hosting this meetup! From SMACK to SMAACK Running Alluxio on DC/OS Abstract Speed is usually a key factor when analyzing large amounts of data. Alluxio enables analytics applications, such as Apache Spark, to retrieve stored data at memory speeds. DC/OS makes it easy to deploy distributed programs (such as Alluxio and Spark) and containers across large clusters. In this talk, we will first discuss the development of the DC/OS Alluxio package, which deploys Alluxio on top of DC/OS, and then then demo the deployment a complete analytics stack, both with and without Alluxio, in order to see the benefits Alluxio provides. Speaker Bios Jörg is a software engineer at Mesosphere in San Francisco. In his previous life he implemented distributed and in memory databases and conducted research in the Hadoop and Cloud area. His speaking experience includes various Meetups, international conferences, and lecture halls. Adit Madan is a software engineer at Alluxio. His experience is in distributed systems, storage systems, and large-scale data analytics. He has a M.S. from Carnegie Mellon University, and a B.S. from IIT. Scalable Genomics Data Processing Pipeline with Alluxio, Mesos, and Minio Abstract Guardant Health leverages Alluxio, Mesos, and Minio to create an end-to-end processing solution that is performant, scalable, and cost optimal. We use Alluxio as the unified storage layer to connect disparate storage systems and bring memory performance, with Minio mounted as the under store to Alluxio to keep cold (infrequently accessed) data and to sync data to AWS S3. Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines, enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. In this talk I will share our experience using Alluxio, Mesos, and Minio to tame genomic data at Guardant Health. Speaker Bio Omar Sobh is a DevOps Engineer at Guardant Health leading the charge for storage and compute initiatives for the Guardant Health Genomic Processing Pipelines.

    2
  • Crash-Proofing Smartphones with Alluxio

    Location visible to members

    Alluxio (formerly Tachyon) features a chance to interact with other Alluxio (http://www.alluxio.com/) users and developers, along with two presentations. Registration is required. Bin Fan from Alluxio and Chen Tian from Huawei will co-present the need and architecture for a memory-speed distributed storage solution, using Alluxio and Huawei’s Fusion Storage as an example. Lin Ma from Huawei will be sharing their experience leveraging Alluxio to gain faster insights from mobile data. Agenda: 6:30 - 7:00: Food & Networking 7:00 - 8:00: Presentations 8:00 - 8:15: Q&A 8:15 - 8:30: Wind down Food will be available starting at 6:30pm, presentations will begin at 7:00pm. Special thanks to Huawei for sponsoring and hosting this meetup! Using Alluxio for a Memory-Speed, Scalable, Object Storage Solution for Enterprise Big Data Analytics Abstract: Enterprises typically store large amounts of data in existing storage systems, which are often separate from big data analytics systems. Therefore, importing petabytes of data into a big data analytics system takes a long time with large overheads and high costs. Even worse, transferring large amounts of data results in data silos and unnecessary duplication, which creates serious data management problems. Alluxio solves this problem by transparently connecting applications to existing storage, and by accelerating access by keeping the data in memory. We present how an architecture looks like with Alluxio and existing storage solutions. Also, Alluxio and Huawei work together to provide a memory-speed distributed storage solution with dynamic data tiering, achieving transparent classification for hot and cold data and bring more efficient data analysis while reducing more than 30% of the total storage cost. Bio: Bin Fan is a software engineer at Alluxio Inc. He is one of the top contributors and the PMC maintainer of the open source Alluxio project. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure and won Google's Technical Infrastructure award. Bin got his Ph.D. in Computer Science from Carnegie Mellon University in 2013. Chen Tian is a director in Software Lab at Huawei U.S. R&D Center. He is leading a team responsible for delivering cutting-edge solutions that improve software performance in various domains ranging from android phones, 4G/5G wireless network, SDN, NFV, storage server, to cloud computing and data center. Mr. Tian is also a researcher with a proven track record on publishing papers on top-tier computer science conferences. His research interests span the areas of distributed and parallel systems, operating systems, compilers, programming languages, software engineering, and computer architecture. Crash-Proofing Smartphones with Alluxio Abstract: Huawei is one of the worldwide market leaders for mobile smartphones. In order to improve overall quality assurance for handsets, Huawei uses its own internal, distributed program analysis framework in order to process all the data from mobile devices. These processes are critical to identifying root causes for crashes, failures, and performance issues. Improved analysis on the mobile data can prevent future crashes and increase performance for mobile devices. High throughput and low latency program analysis on the data is important for faster discovery of issues, more complete analysis, and better efficiency of resources. Alluxio improves the throughput and latency by storing data in memory, thus resulting in faster insights from the data. We describe how we detect and prevent mobile phone issues, discuss how we use Alluxio in our architecture, and present results from initial experiments. Bio: Lin Ma is Senior Staff Research Scientist in Huawei America Research Center, affiliated in Parallel and Distributed Computing Lab. He received his PhD from Washington University in St. Louis. His research interest includes parallel architecture and algorithms, performance evaluation and tuning model for multi-threading system, accelerating application-specific architectures, and high-performance distributed computing over architecturally diverse system of CPUs and GPUs.

    7
  • Alluxio: Unifying APIs, Accelerating ML, & Enabling Cloud Architectures

    Alluxio (formerly Tachyon) features a chance to interact with other Alluxio (http://www.alluxio.com/) users and developers, along with three presentations. Registration is required. Eric Anderson from Google will be highlighting the importance of Alluxio and Apache Beam in the big data ecosystem. Gianmario Spacagna from Pirelli Tyre will share how Alluxio accelerated machine learning pipelines. Calvin Jia from Alluxio (http://www.alluxio.com/) will discuss how Alluxio is a perfect fit for big data processing in the cloud, and accessing object storage. Agenda: 6:30 - 7:00: Food & Networking 7:00 - 8:00: Presentations 8:00 - 8:15: Q&A 8:15 - 8:30: Wind down Food will be available starting at 6:30pm, presentations will begin at 7:00pm. Rise of Intermediate APIs (Beam and Alluxio) Abstract: The big data stack started with just two APIs, one for processing (Hadoop/MapReduce) and one for storage (HDFS/GFS), and now there are many more. It seems every year there is a new project to get excited about, but it also means new APIs, re-writing pipelines, and complex storage logic if you want to take advantage of it. The cost of this is driving users toward intermediate APIs that separate the interface and execution/implementation. In this talk we discuss what these abstractions should look like, how they will impact the industry and two projects that embody them: Beam and Alluxio Beam, a job description layer, sits atop popular execution frameworks, including Spark and Flink. Writing a pipeline in Beam means it's portable across these execution frameworks. It unifies batch and stream, on-premise and cloud and big and small data processing. Alluxio, a distributed memory-centric virtual file-system, accepts data from popular execution frameworks as well as popular storage and file-systems. It offers a universal namespace and tiered logic at memory speeds. Using intermediate APIs means developers can learn just one framework and still access features offered by different technologies. It means writing job logic only once and being able to test it easily on a new underlying service with no effort. Not only is modularity a win for users but it means creators of execution frameworks and storage systems can focus on performance and capability without having to worry about API maintenance. Bio: Eric is Product Manager at Google on Cloud Dataflow. He works closely with Beam committers and is a minor contributor. Previously he was at Amazon Web Services on EC2. He is also on the project management committee for Alluxio and a minor contributor. He studied engineering at the University of Utah and Business at Harvard. In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines on top of Spark and Alluxio Abstract: Legacy enterprise architectures still rely on relational data warehouse and require moving and syncing with the so-called "Data Lake" where raw data is stored and periodically ingested into a distributed file system such as HDFS. Moreover, there are a number of use cases where you might want to avoid storing data on the development cluster disks, such as for regulations or reducing latency, in which case Alluxio (previously known as Tachyon) can make this data available in-memory and shared among multiple applications. We propose an Agile workflow by combining Spark, Scala, DataFrame (and the recent DataSet API), JDBC, Parquet, Kryo and Alluxio to create a scalable, in-memory, reactive stack to explore data directly from source and develop high quality machine learning pipelines that can then be deployed straight into production. In this talk we will: * Present how to load raw data from an RDBMS and use Spark to make it available as a DataSet * Explain the iterative exploratory process and advantages of adopting functional programming * Make a crucial analysis on the issues faced with the existing methodology * Show how to deploy Alluxio and how it greatly improved the existing workflow by providing the desired in-memory solution and by decreasing the loading time from hours to seconds * Discuss some future improvements to the overall architecture Bio: Gianmario is a Senior Data Scientist at Pirelli Tyre, processing telemetry data for smart manufacturing and connected vehicles applications. His main expertise is on building production-oriented machine learning systems. Co-author of the Professional Manifesto for Data Science (datasciencemanifesto.com), founder of the Data Science Milan Meetup group and former speaker at Spark Summit Europe 2015. He loves evangelising his passion for best practices and effective methodologies amongst the community. Prior to Pirelli, he worked in Financial Services (Barclays), Cyber Security (Cisco) and Predictive Marketing (AgilOne). Alluxio: The Missing Piece of On-Demand Clusters Abstract: On-demand compute clusters are often used to save the cost of running and maintaining a continuous cluster for the sake of ad-hoc analysis. Such clusters also provide significant cost savings in storage, since data can be stored in a much cheaper medium, such as object storage. However, one critical downside which prevents on-demand compute clusters from becoming the norm for sporadic data analytics is the lack of high performance. Without co-locating compute and storage, queries and analysis may take unacceptably long periods of time, greatly reducing the value of gathering such insights. To address this limitation, Alluxio is used as a lightweight data access layer on the compute nodes to bring performance up to memory speeds without requiring a long running cluster. This talk will summarize why Alluxio’s architecture makes it a perfect fit for completing the on-demand cluster puzzle. Bio: Calvin Jia is the top contributor to the Alluxio project and one of the earliest contributors. He started on the project as an undergraduate working in UC Berkeley’s AMPLab. He is currently a software engineer at Alluxio. Calvin has a BS from the University of California, Berkeley.

    7