Skip to content

Intro to Alluxio and Spark

Photo of Alex Zeltov
Hosted By
Alex Z.
Intro to Alluxio and Spark

Details

6:00 PM- 6:30 PM: drinks, mingling

6:30 PM - 8:30PM: Intro to Alluxio and Spark

Brief Description:

Alluxio (formerly Tachyon) is a memory speed virtual distributed storage system and leverages memory for managing data across different storage. Many deployments use Alluxio with Spark because Alluxio helps Spark be more effective and further accelerate applications. We discuss how Alluxio helps Spark be more effective and describe different type of production deployments, involving Mesos, Cloud, onPrem, where Alluxio and Spark are working together.

We will demo how Alluxio and Spark improve performance on the Azure Cloud using the HDInsight Spark cluster.

Abstract:

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system and leverages memory for managing data and accelerating access to data across different storage systems. Alluxio has a quickly growing open source community of developers and users. Many deployments use Alluxio with Spark, and some of them scale out to over PB’s of data.
While Spark is gaining great adoption in the big data ecosystem, Alluxio enables Spark to be even more effective. Alluxio provides a unified namespace of data from various different storage systems, which is convenient for application developers. Alluxio also uses memory to store hot data for applications for fast access to important data. Even while Spark has in-memory cache, Alluxio in-memory storage can further improve Spark applications.
In this talk, we introduce Alluxio, discuss how Alluxio helps Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments, involving Mesos, Cloud, onPrem, where Alluxio and Spark are working together. The Demo will also include accessing storage platforms such as S3 and Hadoop (HDFS), with Spark jobs execute in Apache Zeppelin, powered by Alluxio.

Speaker:

Ancil McBarnett is the Sales Engineer Lead in the East for Alluxio, which produces a memory speed virtual distributed file storage system, uniquely positioned to drive next generation analytics. Prior to Alluxio he was at Hortonworks as a Security and Hive SME, helping different customers kickstart their Hadoop journey and before that, he was the Architect Manager for a state agency responsible for the sharing of secure and sensitive data among first responder and justice systems, where security was a priority. Since joining Alluxio, he has worked mainly with Financial Service providers who are looking to utilize Alluxio as the ideal platform to bridge compute frameworks, especially in containerized environments such as Mesos, with multiple storages such as S3, HDFS and CEPH.

Alex Zeltov is a Global Blackbelt TSP in Big Data and Advanced Analytics at Microsoft with over 17 years of industry experience in Information Technology and most recently in Big Data and Predictive Analytics. Prior to joining Microsoft Alex worked as a Big Data Solutions Architect at Hortonworks.

Photo of Philadelphia  BigData and Advanced Analytics Meetup group
Philadelphia BigData and Advanced Analytics Meetup
See more events
1601 market street
19th floor - wework · Philadelphia, PA