Thomas Levine has downloaded 100,000 datasets from 100 open data portals, and this is what he learned. http://thomaslevine.com/open-data
He will talking about all aspects of how he did this, and downloading is, of course, a big part of that. Here are two repositories that you could link to if you like. They lack comprehensible documentation, though.
Playing with computers since he was young, Thomas Levine eventually developed back and wrist pain, so he started studying ergonomics and conducting quantitative ergonomics research. Then he realized that he’d accidentally become a data scientist. And his back and wrists now hurt less. He also has a band called CSV Soundsystem that makes music from spreadsheets.
Browsing through his site and find interesting questions to ask Tom.
For the first half of the session, he'll talk about what he did and what he learned.
After that, he'll talk in more detail about how to conduct an analysis like this. The specifics will depend on what interests participants,but topics could include
* Planning complicated data workflows/pipelines
* Storing data
* Tricks for making things run faster
Event Material(update on 11/29/2013 Friday)
The speaker is planning on doing at least one of the documented exercises
during the session, but people can feel free to try one beforehand.
He will also talk a bit about brainstorming and six thinking hats.Then we'll do a couple of exercises.
1. Choose an open data catalog. Diagram how a person could manually download all of the datasets. Then change the labels in the diagram so that it describes a computer program that downloads the datasets.
2. Select a guideline from one of these lists, and brainstorm ways of testing it.