Apache Spark™ and Lance Spark Connector

Name: Apache Spark™ and Lance Spark Connector
Start: 2025-09-25T09:30:00-07:00
End: 2025-09-25T10:30:00-07:00

Hosted by Carly A. and Jules S. D.

Bay Area Spark Meetup

Details

📅 Date: September 25, 2025
⏰ Time: 9:30 AM - 10:30 AM PST
📍 Location: online

RSVP HERE 👉 https://lu.ma/76o36xuk 👈

Agenda:

Welcome and Introductions
Talk 1: Scalable Multimodal AI Data Processing on Apache Spark™ with Lance Spark Connector, Jack Ye, LanceDB
Q&A

Abstract:
In this talk, we’ll introduce the Lance Spark Connector and show how it brings Lance’s high-performance, AI-native multimodal storage to Apache Spark™ for large-scale data processing. You’ll learn how Spark can leverage Lance’s unique capabilities—random access, built-in indexing, and native support for vector and blob data types—to work seamlessly with embeddings, images, videos, documents, and more.

We’ll explore how the connector integrates with any Spark-compatible catalog, from Hive Metastore to Unity Catalog, enabling unified governance and discovery. Through real-world examples with Spark, we’ll demonstrate running ingestion, analytics, feature engineering, and retrieval-augmented generation workflows directly on the same multimodal Lance dataset—without costly format conversions—making it the ideal solution in a modern multimodal lakehouse.

Bio:
Jack Ye is a software engineer at LanceDB. He is a PMC member of Apache Iceberg and contributor to various open source projects including Apache Spark and Trino. Prior to joining LanceDB, Jack was a tech lead at AWS for initiatives including SageMaker Lakehouse, S3 Tables, EMR & Athena integration with open table formats.

Apache Spark

Open Source

Apache Spark™ and Lance Spark Connector

Bay Area Spark Meetup

Details

Members are also interested in