Online Office Hour: Enabling Apache Spark for Hybrid Cloud | HDFS & AWS S3 demo

This is a past event

10 people went

Needs a location


Starting 2019 we have launched monthly community office hours online, which will be hosted by PMC Maintainers and top contributors to the Alluxio open source project. If you are interested in presenting or hosting a session please contact [masked].

To join the office hour, RSVP:

Use the following link to join the google hangout:

Dial-in: ‎(‪US‬) ‪[masked]‬
PIN: ‪[masked]#‬

For April the topic is Enabling Apache Spark for Hybrid Cloud | HDFS and AWS S3 demo:
Alluxio can help data scientists and data engineers interact with different storage systems in a hybrid cloud environment. Using Alluxio as a data access layer for Big Data and Machine Learning applications, data processing pipelines can improve efficiency without explicit data ETL steps and the resulting data duplication across storage systems.

In this Office Hour you'll learn:
-How to set up Alluxio so that applications can seamlessly read from and write to different storage systems (including cloud storage like AWS S3, Azure Blob Store and on-prem storage like HDFS)
-How to analyze data access pattern and also manage data lifecycle in Alluxio using Alluxio web UI and shell commands
-Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more