SAM BAIL: Data quality tools in Python
Details
ABOUT TALK
Take a tour of the wonderful world of data quality in Python with Dr. Sam Bail. First, we’ll explore the landscape of data quality related open source libraries making brief stops at pydqc, datagristle, bulwark, dvc, dedupe. In the second half of the presentation, we will take a closer look at Great Expectations, one of the most popular open source Python packages for data validation and documentation. Dr. Bail will demo how to create and run test suites and use the profiling feature to automatically create data tests with Great Expectations.
ABOUT SPEAKER
Sam Bail is a data professional with a passion for turning high quality data into valuable insights. Sam holds a PhD in Computer Science focusing on Knowledge Representation, Automated Reasoning, and the Semantic Web. She has worked for several data-centric startups in recent years, gaining deep experience with real-world healthcare data and data quality infrastructure.
