
What we’re about
This is a meetup for Bay Area users of Apache Spark (http://spark.apache.org), a unified analytics engine for large-scale data processing. We rotate hosting meetups among locations in San Francisco, Peninsula, and South Bay.
We discuss other Spark-related ecosystem projects, including Spark SQL, MLlib, GraphX, and Structured Streaming. Additionally, we include introductions to the various Spark features, tutorials, case studies from users, community contributors, best practices for deployment and tuning, and updates on future development and releases.
Upcoming events
1
•OnlineGPU accelerated Spark data processing & metadata management for GenAI workloads
OnlineThis talk outlines the architecture and functionality of NVIDIA's GPU-accelerated Data Science Platform, designed to streamline data processing and metadata capture for Generative AI (Gen AI) workloads. The platform provides APIs for ingestion, processing, and retrieval, leveraging RAPIDS Accelerator for Apache Spark™ compute to GPU-accelerate the pipelines. We use open source technologies like Apache Spark™, Rapids, Delta Lake, and Kubeflow.
Delta Lake, an open source storage layer of Open Lakehouse, establishes reliable, high-quality medallion architecture, providing the ACID properties necessary for versioning, reproducibility, and concurrent metadata management of the massive datasets feeding the Gen AI model training.
Speaker:
Niranjan Nataraja is a Senior Manager - Accelerated Data Processing and ML Platform at NVIDIA. With more than 15 years at NVIDIA, he has worked on numerous projects building big data pipelines for data science tasks and creating mathematical models for data center operations and cloud gaming services. Niranjan has a Master’s degree in Industrial Engineering from Texas A&M University with a primary focus in production economics.25 attendees
Past events
125

