Skip to content

Monitoring Data Quality in Data Science Applications

Photo of Diana Knezevic
Hosted By
Diana K.
Monitoring Data Quality in Data Science Applications

Details

Hi guys,

let´s get together again in May! We scheduled another "sparky" meetup for you. :-)

In this talk, Frank will discuss the topic of data quality problems when developing data science applications.

Looking forward to seeing you soon.

Have a nice day!

Cheers,

Diana

Abstract:
In real world scenarios, data comes from different sources, may be transformed by complex ETL processes, and is owned by different stakeholders. Before the data can be used for modeling, it has to be cleaned and preprocessed. Often times, data scientists build these steps based on technical and domain specific assumptions about the data. You will see how explicitly specifying these assumptions and monitoring the actual situation from the first data delivery on enables an efficient transition from a prototype to a product. Frank will use Apache Spark, Drunken Data Quality (DDQ) (https://github.com/FRosner/drunken-data-quality), Apache Zeppelin and the ELK stack to give a practical example of this approach.

Speaker:
Frank Rosner is working as a Data Scientist in the Global Data and Analytics Competence Center of Allianz SE. As a data nerd and open source developer he is contributing and committing to open source projects like Apache Spark, Apache Mahout, Spark Notebook and Apache Zeppelin (incubating). His research interests are in the field of probabilistic topic models and integration of data science and data architecture. If there is a problem but no tool to solve the it, Frank does not hesitate to build one.

Photo of AI Performance Engineering Meetup (Munich) group
AI Performance Engineering Meetup (Munich)
See more events
comSysto GmbH
Tumblingerstr. 23 80337 · München