Skip to content

[HYBRID] Data Quality with Azure Databricks & Databricks Data Lakehouse & Delta

Photo of dario amorosi
Hosted By
dario a.
[HYBRID] Data Quality with Azure Databricks & Databricks Data Lakehouse & Delta

Details

Ciao a tutti! Sono molto felice di annunciare che Big Data AI Torino ritorna con un meetup in presenza, dopo due anni, il 14 Giugno, con un evento gratuito incentrato su Databricks. Emanuele Maffeo, Senior Big Data Engineer ad AgileLab, parlerà di come implementare data pipeline con Azure Databricks con un occhio di riguardo per la Data Quality , successivamente Mattia Zeni, Solutions Architect Data & AI a Databricks farà una introduzione alla Data Lakehouse Platform concentrandosi poi sulle caratteristiche tecniche di Delta. Infine avremo un rinfresco per conoscerci tutti meglio.

I talks saranno in Italiano

Per chi non potrà attendere in presenza l’evento sarà disponibile anche on-line, per favore registrati all’annuncio riguardante la versione del meetup alla quale vuoi partecipare, così da permetterci di organizzare meglio l’evento. ON-LINE

LOCATION
L’avento si terrà al toolbox coworking [https://toolboxcoworking.com/ ](https://toolboxcoworking.com/)
Facilmente raggiungibile in metro ed in zona centrale
Via Agostino da Montefeltro, 2, 10134 Torino TO

L’evento durerà approssimativamente due ore + rinfresco:
18:40-19:25 - Data Quality with Azure Databricks (45 mins)
19:25-19:35 - Q&A (10 mins)
19:35-20:20 - Databricks Data Lakehouse & Delta deep dive (45 mins)
20:20-20:30 - Q&A (10 mins)

-------------------------
Data Quality with Azure Databricks

Abstract: With the increasing amount of data and processes that run on data lakes, data quality is more relevant than ever since it is considered essential by the big data community. High-quality data is a precondition for analyzing and guaranteeing the value of data, so it is a practice that has to be implemented from the beginning of the data lifecycle and then progressively extended to all core processes to get the full benefit from it. Guaranteeing data quality in every step of the data lifecycle is not easy since it generally is enforced after the data has been ingested or processed and after the low-quality data is being used and did damage to the end users.

Azure Databricks is a cloud service that allows the processing of huge amounts of data, batch, and streaming, leveraging apache spark. Databricks developed delta lake, an open-source data storage layer based on parquet, with several additional features like the ability to manage ACID transactions and data time traveling.
In this talk we will explore the azure databricks architecture and the techniques that allow creating data pipelines with data quality as a first-class citizen.

Speaker Emanuele Maffeo https://www.linkedin.com/in/emanuele-maffeo-34815817/
-------------------------

Databricks Data Lakehouse & Delta deep dive

Abstract: Databricks is the Data and AI company. Headquartered in San Francisco, with offices around the world and hundreds of global partners, Databricks is on a mission to simplify and democratize data and AI, helping data teams solve the world’s toughest problems. Today, more than 7,000 organizations worldwide rely on Databricks to enable massive-scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics. As the world’s first and only Lakehouse Platform in the cloud, Databricks combines the best of data warehouses and data lakes to offer an open and unified platform for data and AI. One of the core components of this platform is Delta, an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python. During the meetup we will introduce you to the Lakehouse platform and we will deep dive into the technical characteristics of Delta.

Speaker bio Mattia Zeni (https://www.linkedin.com/in/mzeni/): Mattia Zeni is a Solutions Architect in the Enterprise team of Databricks, helping Italian's biggest companies leveraging Data and AI to solve the world's toughest problems. Before this, he was a Data Engineer in world's leading navigation company TomTom, working on Big Data collected from 800 million vehicles around the globe. He received a BSc and MSc in Telecommunications Engineering and a PhD in Computer Science from the University of Trento, Italy. He presented at many conferences and co-authored more than 20 scientific papers and 3 patents on topics such as Distributed Systems, Big Data and Artificial Intelligence.

Ci vediamo il 14 Giugno!

COVID-19 safety measures

Event will be indoors
The event host is instituting the above safety measures for this event. Meetup is not responsible for ensuring, and will not independently verify, that these precautions are followed.
Photo of Big Data & AI Torino group
Big Data & AI Torino
See more events
Big Data & AI Torino
Photo of Big Data & AI Torino group
No ratings yet
Via Agostino da Montefeltro, 2
Via Agostino da Montefeltro, 2 · Torino, TO