Fishing for AI-Powered Insights: Lakehouse Technologies


Details
In the past decade, companies have been focusing on machine learning powered insights to drive their businesses forward. Recently, there has been a focus on agentic AI with LLMs trained on proprietary company data. To train these models, companies have turned to data lake technologies to address the exponential growth of data and the need for more flexible, scalable data management solutions. Many modern ML algorithms and architectures (neural networks, transformers, backpropagation, LSTMs) have been around for decades, but require a massive amount of data to train. The volume, variety, and velocity of data required for modern ML have outpaced traditional data storage and processing systems. Data lakes offer a compelling solution by providing a centralized repository capable of storing vast amounts of raw, unstructured, and semi-structured data in native formats, well-suited for machine learning and artificial intelligence tasks.
In this talk, we will discuss data lake technologies. We will cover the history of relational databases, data warehouses, ML algorithms, and data lakes. We will also dive into technical details of table formats like ACID guarantees of Delta Lake and Apache Iceberg, the underlying file formats like Apache Parquet, and how they come together to create the lakehouse for ML and AI.
This talk is intended for a general audience. Food and drinks will be included.
Thank you to Databricks for sponsoring this talk.
www.databricks.com
Thank you to GuidePoint Security for hosting us at their lovely office space.
https://www.guidepointsecurity.com/
Thank you to KongHQ for sponsoring the drinks!
https://konghq.com/
We will be raffling away copies of O'Reilly's Spark books.

Sponsors
Fishing for AI-Powered Insights: Lakehouse Technologies