Join us for a fun data engineering event focusing on the open-source project Delta Lake and Salesforce's search ranking powered by spark.
Delta Lake session will feature Michael Armbrust is a committer and PMC member of Apache Spark, the original creator of Spark SQL, and leads the team at Databricks that designed and built Structured Streaming and Delta Lake.
Salesforce's search relevance team will present on search ranking optimization at scale with spark. Utilizing spark and kubernetes to analyze billions of records, Salesforce is able to deliver accurate search results to millions of customers world wide everyday.
6:30pm-7:00pm: Spark @ Salesforce: Our Search for Insights
7:00pm-7:45pm: Open Source Reliability for Data Lakes w/ Apache Spark
7:45pm-8:00pm: Q&A and Wrap up
Session: Delta Lake: Open Source Reliability for Data Lakes w/ Apache Spark
Speaker: Michael Armbrust
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
In this talk, we will cover
- All technical aspects of Delta Features
- What’s coming
- How to get started using it
- How to contribute
Session: Spark @ Salesforce: Our Search for Insights
Speaker: Rama Raman, Callie Anderson
Spark is an integral tool in developing Salesforce Search, our most used functionality across Salesforce’s platform and products and arguably the largest enterprise search implementation worldwide. Salesforce Search fields millions of queries for thousands of organizations over billions of records every day, and the computing efficiency and flexibility of Spark running on top of in-house Kubernetes cluster empowers model training at scale in order to optimize ranking the records for every query.