
What we’re about
This is a meetup for Bay Area users of Apache Spark (http://spark.apache.org), a unified analytics engine for large-scale data processing. We rotate hosting meetups among locations in San Francisco, Peninsula, and South Bay.
We discuss other Spark-related ecosystem projects, including Spark SQL, MLlib, GraphX, and Structured Streaming. Additionally, we include introductions to the various Spark features, tutorials, case studies from users, community contributors, best practices for deployment and tuning, and updates on future development and releases.
Upcoming events
2

Apache Spark™ Happy Hour
Databricks Mountain View, 351 E Evelyn Ave, Mountain View, CA, USIn the Bay Area? Join us for the Apache Spark Happy Hour on November 13 from 5:00 to 6:30 PM at the Databricks Mountain View Office, immediately following Open Lakehouse + AI Mini Summit! 🎊 This event brings together contributors, committers, and maintainers from the Apache Spark community for an evening of networking, conversation, and fun. 🙌
➡️ Register here!
Enjoy free swag, light bites, and drinks while connecting with new and familiar faces. Whether attending the Open Lakehouse + AI Mini Summit or looking to engage with the Spark community, this informal happy hour is a great way to keep the energy going, share ideas, and celebrate what makes the Apache Spark community special. We hope to see you there! 👋
36 attendees
•OnlineGPU accelerated Spark data processing & metadata management for GenAI workloads
OnlineThis talk outlines the architecture and functionality of NVIDIA's GPU-accelerated Data Science Platform, designed to streamline data processing and metadata capture for Generative AI (Gen AI) workloads. The platform provides APIs for ingestion, processing, and retrieval, leveraging RAPIDS Accelerator for Apache Spark™ compute to GPU-accelerate the pipelines. We use open source technologies like Apache Spark™, Rapids, Delta Lake, and Kubeflow.
Delta Lake, an open source storage layer of Open Lakehouse, establishes reliable, high-quality medallion architecture, providing the ACID properties necessary for versioning, reproducibility, and concurrent metadata management of the massive datasets feeding the Gen AI model training.
Speaker:
Niranjan Nataraja is a Senior Manager - Accelerated Data Processing and ML Platform at NVIDIA. With more than 15 years at NVIDIA, he has worked on numerous projects building big data pipelines for data science tasks and creating mathematical models for data center operations and cloud gaming services. Niranjan has a Master’s degree in Industrial Engineering from Texas A&M University with a primary focus in production economics.23 attendees
Past events
124

