Best Practices for Building a Reliable Lakehouse
Details
Abstract: This is a practical playbook for building a production-grade data lakehouse. It walks through foundational principles — naming conventions, least-privilege access, automated CI/CD testing — before diving into medallion architecture. Furthermore, metadata-driven design patterns show how configuration tables and dynamic notebook orchestration eliminates hard-coded pipelines. The deck covers star schema modeling, guidance on choosing between Spark, Pandas, and SQL, and data quality enforcement using DQX with YAML data contracts. Finally, we dive into security best practices and performance optimizations.
Host: Justin Shea, Mehdi Jeddi, Erik Pak, and Sou-Cheng Choi
Talk Format: This is a hybrid event. To attend online, join us on Zoom here at 6pm:
https://iit-edu.zoom.us/j/89379230295?pwd=NdETyE5sdYuSrvsrBZXSBFkUESBVkg.1
Meeting ID: 893 7923 0295
Passcode: 5t5WYn
Sponsor: Adyen, UIC College of Business, and PyData Chicago co-host this event. UIC will provide the meeting site. Adyen will sponsor pizza and soft drinks for the onsite participants.
Address: University of Illinois - Chicago, Douglass Hall, Room 220, 705 S Morgan St, Chicago, IL 60607
Logistics: “UIC Douglass Hall” is recognized on Google Maps, which can guide you through campus. Once you arrive, proceed to the second floor, room number 220




