To start the new year we are inviting Todd Schneider who recently published a fascinating, in-depth analysis of NYC Taxi and Uber trips.
About the talk:
Personal computers are really powerful: more than powerful enough to analyze the full dataset of every NYC taxi trip from 2009 to 2015—over 1.1 billion trips! I'll go over my recent work analyzing the full taxi dataset (http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/) using PostgreSQL, PostGIS, and R, and discuss some underlying tips and techniques that I've found helpful for analyzing medium-sized datasets.
It's not always clear what constitutes "small", "medium", or "big" data, but for the purposes of this talk, I'll consider medium data to mean "a few hundred gigabytes"—small enough to fit on your local hard disk, but big enough that you have to plan ahead and act deliberately if you want to extract meaningful insights within a reasonable amount of time.
Todd Schneider (http://toddwschneider.com) writes software at Genius (http://genius.com). Before that, he spent six years building statistical models to value mortgage-backed securities at Ellington Management Group (http://www.ellington.com), and before that he studied applied math and electrical engineering at Yale. His interests beyond data/math/computers include golf, prediction markets, and action movies. He writes at http://toddwschneider.com .
Pizza starts at 6:30, the talk at 7 then the bar after.