Making Apache Spark Better With Delta Lake

Data Works MD
Data Works MD
Public group

JHU APL, Building 200

11101 Johns Hopkins Rd · Laurel, md

How to find us

Building 200 is on the South campus of JHU APL. There is ample free parking that surrounds the building.

Location image of event venue

Details

Managing data lakes, which are are data repositories that store large and varied sets of raw data in its native format, can be challenging. Join us in February to learn about Delta Lake, an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.

Agenda
-------------------------------------------------
6:30 PM -- Networking & Food

7:00 PM -- Greetings

7:05 PM -- Making Apache Spark™ Better With Delta Lake - Tim Lortz

8:30 PM -- Closings & Door Prizes

Location
-------------------------------------------------
JHU APL
Building[masked] Johns Hopkins Rd
Laurel, MD 20723

Parking
-------------------------------------------------
Building 200 is on the South campus. There is ample free parking that surrounds the building.

Food and Drinks
-------------------------------------------------
Complimentary food, such as pizza and chips, and non-alcoholic beverages will be provided.

Talks
-------------------------------------------------
Making Apache Spark™ Better With Delta Lake
Apache Spark™ is the dominant processing framework for big data. Delta Lake adds reliability to Spark so your analytics and machine learning initiatives have ready access to quality, reliable data. No more malformed data ingestion, difficulty deleting data for compliance, or issues modifying data for change data capture.

This talk will cover the use of Delta Lake to enhance data reliability for Spark environments. We will also demonstrate an end-to-end big data pipeline capped by a machine learning model in Spark.

More info available at https://docs.databricks.com/delta/index.html#

Speakers
-------------------------------------------------
Tim Lortz is a Solutions Architect at Databricks, where he helps Federal customers harness the power of their data through the combination of cloud computing, managed data lakes and machine learning. He is an advocate for highly scalable, open-source technologies such as Apache Spark, Delta Lake and MLflow. Tim can be found on LinkedIn at https://www.linkedin.com/in/tim-lortz-08925937/

Prior to Databricks, Tim spent over a decade working as a data scientist and leading data science teams in the Ft. Meade area, with a focus on infrastructure modeling & forecasting, signals intelligence and cybersecurity. Tim holds an MS and PhD in Industrial & Operations Engineering from the University of Michigan and a BS in Industrial Engineering from the University of Pittsburgh.

Company
-------------------------------------------------
Databricks
As the leader in Unified Data Analytics, Databricks helps organizations make all their data ready for analytics, empower data science and data-driven decisions across the organization, and rapidly adopt machine learning to outpace the competition. By providing data teams with the ability to process massive amounts of data in the Cloud and power AI with that data, Databricks helps organizations innovate faster and tackle challenges like treating chronic disease through faster drug discovery, improving energy efficiency, and protecting financial markets.

Databricks was founded in 2013 and has thousands of global customers including Comcast, Shell, HP, Expedia, and Regeneron. The company also has hundreds of global partners that include Microsoft, Amazon, Tableau, Informatica, Cap Gemini and Booz Allen Hamilton. Databricks is founded by the original creators of popular open source projects, including Apache Spark, Delta Lake, MLflow and Koalas.

More info available at https://databricks.com/