Interactive Session: Finding Bad Data in Big Data
Bring your laptop (or team up with someone who did) - we're trying something new this month. We are going to investigate a public dataset as a group, looking for "bad data". Everyone can share their tips and methods for finding bad data, and we'll talk about how to triage cleaning a large dataset in a short, fixed timeframe.
The dataset (Wikipedia-based) will be large enough to demonstrate specific techniques needed to find problems with "big" datasets, but small enough to be downloaded quickly.
Hope to see you Wednesday!