Skip to content

Details

Abstract: This is a practical playbook for building a production-grade data lakehouse. It walks through foundational principles — naming conventions, least-privilege access, automated CI/CD testing — before diving into medallion architecture. Furthermore, metadata-driven design patterns show how configuration tables and dynamic notebook orchestration eliminates hard-coded pipelines. The deck covers star schema modeling, guidance on choosing between Spark, Pandas, and SQL, and data quality enforcement using DQX with YAML data contracts. Finally, we dive into security best practices and performance optimizations.

Host: Justin Shea, Mehdi Jeddi, Erik Pak, and Sou-Cheng Choi

Talk Format: This is a hybrid event. To attend online, join us on Zoom here at 6pm:
https://iit-edu.zoom.us/j/89379230295?pwd=NdETyE5sdYuSrvsrBZXSBFkUESBVkg.1

Meeting ID: 893 7923 0295
Passcode: 5t5WYn

Sponsor: Adyen, UIC College of Business, and PyData Chicago co-host this event. UIC will provide the meeting site. Adyen will sponsor pizza and soft drinks for the onsite participants.

Address: University of Illinois - Chicago, Douglass Hall, Room 220, 705 S Morgan St, Chicago, IL 60607

Logistics: “UIC Douglass Hall” is recognized on Google Maps, which can guide you through campus. Once you arrive, proceed to the second floor, room number 220

Related topics

Events in Chicago, IL
Apache Spark
Big Data
Data Pipelines
Open Source
Software Production Pipeline

Sponsors

Tegus by AlphaSense

Tegus by AlphaSense

Space and Food Sponsorship

W W Grainger Inc

W W Grainger Inc

Venue and food sponsor

Illinois Institute of Technology

Illinois Institute of Technology

Venue and Financial Sponsor

Adyen

Adyen

Financial Sponsor

You may also like