Skip to content

Details

Co-Hosted by NYC Open Data meetup and NYC Data Science Academy (https://www.meetup.com/NYC-Data-Science-Academy/)meetup.

*** Apply for 12-week full-time Data Science bootcamp program or online part-time program to be a Data Scientist (http://nycdatascience.com/data-science-bootcamp/) , next offering is from Jan 9th to Mar 31th, 2017 ***

==================================================

Data Science Hack-Night: Bike Share Stations

Come and Do the challenge hands-on together!

Speaker: Erica Dohring, Data Analyst at Facebook, Lover of Exercise and Bad Puns

Imagine you are a statistical consultant being hired by Transport for London. For those not familiar, this is the governmental organization that runs London’s public transit including…the tube (their subway), the busses, and the bike share stations. They have a variety of projects and questions they think are interesting to dig into related to bike share stations, so pick one that you find interesting too.

 Visualization – Take a look at what’s there. Do you see anything weird? Anything you want to dig into?

Summarize your findings to the leadership.

 Metrics – what metrics do you think are most important for them to optimize for – ridership, profit, etc?

Why? Explain. 1 Exercise: Come up with a list of metrics (ridership, profit, the number of boroughs, square miles, etc.) and plot distributions of them. Check these distributions out over a few time periods – 1 day, 7 days perhaps. How do they differ? At the end, come up with some insights you think are important to highlight and prototype a dashboard you might build for the TFL to track those key metrics.

 Modeling for Interpretability – Once you have decided on what metrics are important (say, ridership if you didn’t do exercise I) create a model to understand what the largest drivers are of that metric. Use the publicly available on London’s Datastore. Hint: most data is aggregated by borough. At the end, come up with a presentation you would show to the leadership about what seems to drive high ridership (or whatever metric you think is key) and what they could do to drive it up.

 Modeling for Accuracy – It’s my understanding that trucks are the biggest issue (and cost) for a bikeshare program. What if you could predict when the next bikeshare station is going to be empty? Full? For this problem, build a prediction engine that will take in real-time data and tell the TFL when the bikeshare stations will be empty or full.

 Recommendation – Recommend where the TFL should consider opening their next bikeshare station and why. If you want to build an automated solution, build a recommender engine based on real-time data.

 Systems Analysis: Long Range Planning – as the governing body of different transit means for the city of London, analyze transportation planning for the city of London among the different transit stations and tell them how to allocate their budget for next year – things to consider: exercise + healthcare, smog, carbon, happiness, etc.

 Monetization Structure – Currently the bikes share uses the following pricing model. Analyze this model and propose a potential iteration of it so the bikeshare station could maximize profit (while also keeping other metrics they care about at Bay)

Data to Use: Anything on the London TFL Open Data Store, Bike Share API itself.

Related topics

You may also like