Data Workshop #7


Details
We'll get together again to drill through interesting data sets. Work alone, in pairs, or in teams. Drop by anytime between 1 p.m. and 4 pm!
The mission: explore the data and do interesting analysis. Learn about the tools and techniques other people are using. This isn't a hackathon. No judging and prizes. Just fun.
Bring your own laptop unless you just want to be an adviser/spectator. We have no tool or stack preferences. Different teams will use R, Python, Matlab, SAS, Excel, Ruby, etc.
We'll have pizza in case people want to snack.
Scraping Data Three Ways
This event will focus on web scraping. Some of us who are starting out will use import.io (https://www.import.io/). Others will use Ruby or Python tools like Scrapy (http://scrapy.org/). Those looking for fast asynchronous approaches will use Node.js.
Please scrape data responsibly. Do not disrupt websites' operations or scrape from sites that don't allow it.
Challenges:
Real Estate - Part 1
- What is the median price per square foot of apartments currently listed for rent in every neighborhood in Manhattan? What are the best deals (apartments with the lowest $/square foot in their respective neighborhoods) in Manhattan right now?
Crowdfunding
- What is the typical path that crowdfunding takes over the course of a listing? Can you detect projects that are ahead or behind the curve?
Real Estate - Part 2
- Scrape monthly data on new housing permits from the Department of Housing and Urban Development here (http://www.huduser.org/portal/datasets/socds.html) and compare it to economic indicators. How does today's activity compare to the days before and after the housing crisis?
Shoes!
- Suppose we wanted to develop a website that re-directs to listings of hard-to-find shoe (or other clothing) sizes. Scrape the data we would need to provide those listings.

Data Workshop #7