Skip to content

Building Data Orchestration and Notebooks as Functions

Photo of Denny Lee
Hosted By
Denny L. and Jasmine W.
Building Data Orchestration and Notebooks as Functions

Details

Happy 2023 and we're going to kick start it with our first meetup back in Seattle downtown at the Common Room offices in Pioneer Square! Come for great technical content, discussions, and food! Please RSVP so we can make sure we have enough!

Agenda

  • 6pm: Doors Open, Eat, & networking
  • 6:30pm-7:10pm: Building Data Orchestration for Big Data Analytics in the Cloud by Jasmine Wang and Shouwei Chen from Alluxio
  • 7:15pm-7:55pm: Notebooks as Functions with Koushik Krishnan is a Site Reliability Engineer at Yugabyte
  • 8:15pm Close up

Session 1: Building Data Orchestration for Big Data Analytics in the Cloud

Abstract:
Cloud has been dramatically changing the landscape of data engineering as well as the behavior of data engineers. Specifically, data storage is migrating from the colocated model (e.g., HDFS) to a more cost-effective, more scalable but often fully disaggregated and remote data lake model (e.g. AWS S3). This has also created a strong need for data orchestration in the cloud  like what Kubernetes does for container-based workloads, so that data can be presented in the right layout at right location for data consuming applications on the cloud.

Originally developed from UC Berkeley AMPLab as research project "Tachyon", Alluxio (www.alluxio.io) implements the world’s first open-source data orchestration system in the cloud. Alluxio creates a unified access layer for data-driven applications in bigdata and ML, enabling Spark, Presto or TensorFlow and etc to transparently access different external storage systems while actively leveraging in-memory cache to accelerate data access.

In this talk, the speaker will present
- New trends and challenges in the data ecosystem in cloud era
- Effective Data engineering in the cloud world with data orchestration
- Production use cases of using popular stacks like Presto/Alluxio/S3

Speakers
Jasmine Wang is the Head of Community and DevRel at Alluxio. She is a former national debate champion who turned into a traveling yoga teacher with a strong passion in building teams and being the bridge at early startups in Silicon Valley. Previously, she worked as the Head of Global Talent Acquisition and Operations. Currently she is building the Alluxio open source community, responsible for community, developer relations, developer experience, and cross-community collaborations at Alluxio.

Dr. Shouwei Chen is a core maintainer and product manager of open-source Alluxio. Before joining Alluxio, Shouwei received a Ph.D. degree from Rutgers University. Shouwei’s research focuses on the codesign of the memory-centric computing frameworks with in-memory distributed file systems in large-scale environments.

Session 2: Notebooks as Functions

Jupyter notebooks are a wonderful environment to write code for both beginners and experienced individuals. The hard part comes when you want to take your notebook and productionize it. That's where Jupyrest comes to the rescue. Jupyrest is a tool that can turn Jupyter notebooks into HTTP functions. It's a serverless platform for Jupyter notebooks. I created Jupyrest at Microsoft and open sourced it earlier this year. In this talk I'll demonstrate how to use Jupyrest to productionize your Jupyter notebooks.

Speaker: Koushik Krishnan is a Site Reliability Engineer at Yugabyte. In his words: I am a site reliability engineer/software engineer and I love Python! My passion is making on-call boring. I build tools and services that drastically reduce the number of pages. I have lived in Seattle for about 4 years and in my free time I like to play disc golf, play piano and watch football.

Photo of Seattle Spark+AI Meetup group
Seattle Spark+AI Meetup
See more events
Common Room Headquarters
83 S King St, Floor 8 · Seattle, wa