Delta Lake w/ Michael Armbrust + Spark @ Salesforce

This is a past event

238 people went Inc.

929 108th Ave NE Suite 1800 · Bellevue, WA

How to find us

On-site parking is available for a fee. You can also park via street parking, across nearby at The Shops at Bravern (free parking with shop validation), or near the Bellevue Library.

Location image of event venue


Join us for a fun data engineering event focusing on the open-source project Delta Lake and Salesforce's search ranking powered by spark.

Delta Lake session will feature Michael Armbrust is a committer and PMC member of Apache Spark, the original creator of Spark SQL, and leads the team at Databricks that designed and built Structured Streaming and Delta Lake.

Salesforce's search relevance team will present on search ranking optimization at scale with spark. Utilizing spark and kubernetes to analyze billions of records, Salesforce is able to deliver accurate search results to millions of customers world wide everyday.

6:00pm-6:30pm: Welcome
6:30pm-7:00pm: Spark @ Salesforce: Our Search for Insights
7:00pm-7:45pm: Open Source Reliability for Data Lakes w/ Apache Spark
7:45pm-8:00pm: Q&A and Wrap up

Session: Delta Lake: Open Source Reliability for Data Lakes w/ Apache Spark

Speaker: Michael Armbrust

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

In this talk, we will cover
- All technical aspects of Delta Features
- What’s coming
- How to get started using it
- How to contribute

Session: Spark @ Salesforce: Our Search for Insights

Speaker: Rama Raman, Callie Anderson

Spark is an integral tool in developing Salesforce Search, our most used functionality across Salesforce’s platform and products and arguably the largest enterprise search implementation worldwide. Salesforce Search fields millions of queries for thousands of organizations over billions of records every day, and the computing efficiency and flexibility of Spark running on top of in-house Kubernetes cluster empowers model training at scale in order to optimize ranking the records for every query.