Skip to content

Apache Spark™ and Lance Spark Connector

Photo of Carly Akerly
Hosted By
Carly A. and Jules S. D.
Apache Spark™ and Lance Spark Connector

Details

📅 Date: September 25, 2025
Time: 9:30 AM - 10:30 AM PST
📍 Location: online

RSVP HERE 👉 https://lu.ma/76o36xuk 👈

Agenda:

  • Welcome and Introductions
  • Talk 1: Scalable Multimodal AI Data Processing on Apache Spark™ with Lance Spark Connector, Jack Ye, LanceDB
  • Q&A

Abstract:
In this talk, we’ll introduce the Lance Spark Connector and show how it brings Lance’s high-performance, AI-native multimodal storage to Apache Spark™ for large-scale data processing. You’ll learn how Spark can leverage Lance’s unique capabilities—random access, built-in indexing, and native support for vector and blob data types—to work seamlessly with embeddings, images, videos, documents, and more.

We’ll explore how the connector integrates with any Spark-compatible catalog, from Hive Metastore to Unity Catalog, enabling unified governance and discovery. Through real-world examples with Spark, we’ll demonstrate running ingestion, analytics, feature engineering, and retrieval-augmented generation workflows directly on the same multimodal Lance dataset—without costly format conversions—making it the ideal solution in a modern multimodal lakehouse.

Bio:
Jack Ye is a software engineer at LanceDB. He is a PMC member of Apache Iceberg and contributor to various open source projects including Apache Spark and Trino. Prior to joining LanceDB, Jack was a tech lead at AWS for initiatives including SageMaker Lakehouse, S3 Tables, EMR & Athena integration with open table formats.

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
Online event
Link visible for attendees
FREE