Skip to content

Details

Join the 54th NLP Zurich tech shindig online! Jiannan Wang (Associate Professor, Simon Fraser University) will talk about:

  • What makes data preparation hard?
  • Why has this problem not been solved?
  • How to solve it in the next 5-10 years?
    He will answer all these questions and present DataPrep, an open-source Python library aiming to accelerate every data preparation step for AI development. We are looking forward to welcoming you!

Agenda:
17:55 Join the webinar
18:00 Jiannan Wang (Associate Professor, Simon Fraser University): DataPrep: Accelerate Data Preparation for AI
18:35 Q&A
18:50 Virtual Hugs and Kisses ⊂(◉‿◉)つ

Talk Summary:
Data scientists have been complaining about data preparation (data collection --> data understanding --> data cleaning --> data enrichment --> data integration --> feature engineering) for many years. Although some efforts have been devoted to solving this problem, a recent survey released by Anaconda in 2020 shows that it is still the case that "Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on overall job satisfaction." Most recently, Andrew Ng urged the AI community to shift from Model-Centric toward Data-Centric AI development.

In this talk, I will explain what makes data preparation hard to solve, and present DataPrep (http://dataprep.ai), a fast and easy-to-use python library to address these challenges. DataPrep aims to become the "scikit-learn" for data preparation. The DataPrep library currently contains three components: a data connector component to simplify and accelerate data collection, an exploratory data analysis (EDA) component to enable fast data understanding, and a data cleaning component to clean and standardize data. I will describe their novel design and demonstrate how they can significantly save data scientists' time. In the end, I will talk about future project directions.

About the speaker:
Jiannan Wang (https://www.cs.sfu.ca/~jnwang) is an Associate Professor in the School of Computing Science at Simon Fraser University. He has over ten years' research experience in data preparation. His research contributions won him a CS-Can|Info-Can Outstanding Early Career Researcher Award (2020), an IEEE TCDE Rising Star Award (2018), an ACM SIGMOD Best Demonstration Award (2016), a Distinguished Dissertation Award from the China Computer Federation (2013), and a Google Ph.D. Fellowship (2011). He is a General Co-chair for VLDB 2023 and a core PC member for SIGMOD 2019.

NLP Zurich:
Meetup.com: https://www.meetup.com/NLP-Zurich/
Linkedin: https://www.linkedin.com/company/nlp-zurich
YouTube: https://www.youtube.com/channel/UCLLX-5j9UNYassOwS0nveDQ/featured
Twitter: https://twitter.com/nlp_zurich

Members are also interested in