What we're about

This is a meetup for Seattle / Eastside users of Spark (www.spark-project.org), the high-speed Scala-based cluster programming framework. We'll be rotating among locations in Seattle and Bellevue.

We'll also discuss other Spark and AI projects including spark-packages (http://spark-packages.org/), MLflow, TensorFlow, Keras, etc.

We will include introductions to the various Spark+AI features, case studies from current users, best practices for deployment and tuning, and future development plans.

Follow us on Twitter at @SparkAISeattle

Upcoming events (2)

Party time!!! PyData Seattle meetup - Re-Launch event!

Needs a location

An important call out to this next PyData Seattle event - RSVP here!

💚💙 🎉 Come and celebrate with us! The Re-LAUNCH PARTY of our PyData Seattlemeetup by NumFOCUS!
🌷 🌸 🌼 PyData Seattle meetup is an accessible, community-driven meetup, with novice to advanced level presentations in Data Science/ML/AI/DL
💞 💟 Event program:
6:00 - 6:45 - Eat, beverages and network. Sponsor: Databricks 🎉
6:45 - 7:00 - Announcements 💙 💚
7:00 - 8:00 - Lightning Lightning Talks
8:00 - 8:15 - 2 Tickets raffle! 💚💙 🎉 to PyData Seattle 2023 💖 3-day conference, April 26 - 28 Hosted by Microsoft. 1 ticket for a women and 1 ticket for a men

Bellevue Place parking after 8:00 p.m. is complimentary – No validation is necessary.

PyData Seattle is looking for speakers. Many of our members are doing amazing data science with Python tools. We want to hear what you are up to! If you have a presentation of between 10 minutes and 1 hour that you would like to share with our group, please submit a short proposal. You can propose a talk, workshop or lightning talk for our monthly meetups and TalkNights hosted in Seattle and Bellevue.

Distributed Data Lakehouse: Are you building one?

Needs a location

A fast-growing data industry has led to fragmented solutions and unprecedented complexity of data platforms. We’ve seen data silos across data centers, regions, and clouds. There’s a strong demand for a simplified solution that can provide unification of data lakes, efficient data access, and management. Alluxio is a large distributed system that is a new layer between compute engines and storage systems. It provides complete virtualization across all data sources to serve data to applications that do not need to care about the location of data.

In this talk, we talk about an approach to architect an efficient data platform for multiple data pipelines with Spark, Delta Lake, and Alluxio, which is portable across environments, private or public clouds, for optimal cost and performance.

About The Speakers

Jasmine Wang is the Head of Community and DevRel at Alluxio. She is a former national debate champion who turned into a traveling yoga teacher with a strong passion for building teams and being the bridge at early startups in Silicon Valley. Previously, she worked as the Head of Global Talent Acquisition and Operations. Currently, she is building the Alluxio open source community, responsible for community marketing, developer relations, developer experience, and cross-community collaborations at Alluxio.

David Zhu is a software engineer manager at Alluxio. At Alluxio, David mainly focuses on metadata syncing, job service, and end-to-end performance benchmarking and optimizations. Prior to that, David completed his Ph.D. from UC Berkeley’s AMPLab, with a focus on distributed data management systems and operating systems for the data center. David also holds a Bachelor of Software Engineering from the University of Waterloo.

Past events (95)

Scaling XGBoost for thousands of features with Databricks

This event has passed

Photos (232)