Past Meetup

*Special Event* – DataDive on Time-Series Forecasting

This Meetup is past

30 people went

Location image of event venue

Details

We are super happy to announce our first special event. A DataDive is nothing but a hackathon. And this one will challenge your skills to build time-series forecasting models.

Join us for an intense weekend full of hacking, machine learning and jointly increasing our knowledge how to build robust forecasting models.

*Please read the rest below before RSVP’ing*

What skill-level do you need to enjoy the event?
Anyone is welcome. However, it makes sense that you are able to prepare features, build, fit and tune a model within hours (not days). If you are a beginner you probably want to use the time before the event to get acquainted with concepts like training set, test set, hold-out set, overfitting, cross-validation, feature preparation, regression algorithms, model evaluation, etc. (and how you use code for implementation).

In a nutshell:

• Saturday, Oct. 29, 3pm-open end & Sunday, Oct. 30, 10am-1pm

• Hosted by Grünspar GmbH and sponsored by Intuit Inc.

• Goal is to accurately predict the daily visitor numbers of a public swimming pool (Nettebad, Osnabrück)

• 3 datasets: 1 dataset with daily visitor numbers and a few other features from Nettebad Osnabrück (2005–2013), 2 datasets with weather data (see full description of the datasets below)

• Teams are formed at the beginning of the event (max. 4 persons per team, you can come as a team, we don't decide who's in a team)

• Teams compete by submitting predictions to a scoring platform (Kaggle In-Class)

• The team with the best score on the hold-out set wins

• The event is limited to 30 participants (due to room capacity)

• Intuit Inc. sponsors drinks, pizza on Saturday night and prizes for the best 3 teams (value ~600€: iPad, Jawbone speakers, Kindle, etc.)

+++++++++++++

FAQ

+++++++++++++

What will happen with the results? Is there a commercial interest behind the competition?

No. Tobias got the dataset from the general manager of Nettebad in 2014 for a project with his students at the University of Applied Sciences Osnabrück. The general manager is interested how the weather (or other factors) influence visitor numbers. And Tobias will report the influence (or feature weights) back to the general manager. But there is no money involved. The winning team's solution may be featured in the report to the Nettenbad GM.

Is there an expectation on the tools to be used?

Clearly: no. You can use whatever you want to do your predictions. Python, R, Julia, you name it. If someone only uses Excel and comes in first, we will think of a super duper special prize. The only thing that is forbidden: hack the Kaggle server and crack the solution.

Why do you call the event “DataDive”?

The term “DataDive” is an invention of the NYC organization DataKind (http://www.datakind.org/). DataKind does a phenomenal job of getting volunteers excited about harnessing the power of data science in the service of humanity, i.e., helping nonprofit and government organizations. DataDives are weekend-long, marathon-style events where volunteers rally together to help 3-4 social change organizations do initial data analysis, exploration, and prototyping. Daniel Kirsch and his organization Data Science for Social Good Berlin (http://dssg-berlin.org/) brought the first DataDive to Germany in 2015 (see here (https://blog.dssg-berlin.org/data-dive-berlin-2015-765f124ad515#.wbbngg4k1)).

Nettebad Osnabrück belongs to the local government, thus we are crunching a “nonprofit dataset” so that “DataDive” kind of fits for the event.

+++++++++++++

DATASETS

+++++++++++++

***************************

data_nettebad_2005_2013.csv

***************************

date

----

Date

visitors_pool_total

-------------------

number of Nettebad visitors per day (inside pool plus outside pool; excluding sauna visitors, school classes and swimming clubs)

sportbad_closed, freizeitbad_closed, sauna_closed, kursbecken_closed

--------------------------------------------------------------------

dummy variable that is 1 if the respective area is closed for inspection

event

-----

dummy variable that is 1 if an event takes place on that day (e.g., slide competition, swimming contest, pool party)

price_adult_90min, price_adult_max, price_reduced_90min, price_reduced_max

--------------------------------------------------------------------------

admission fees in Euro

*_90min is the minimum fee that is valid for a visit of max. 90 minutes

*_max is the maximum fee you need to pay

reduced fees are for children, handicapped, etc.

Check https://www.nettebad.de/information/preise.html for details

sloop_dummy

-----------

dummy variable that is 1 after the Sloop high-speed slide has opened on December 27th, 2011

Check https://www.youtube.com/watch?v=8Lbhm7eanU4 to get an impression about the Sloop

sloop_days_since_opening

------------------------

counts the days after the Sloop has opened on December 27th, 2011

school_holiday

--------------

0 = no school holiday

1 = school holiday only in Niedersachsen

2 = school holiday only in Nordrhein-Westfalen

3 = school holiday both in Niedersachsen and in Nordrhein-Westfalen

bank_holiday

------------

0 = no bank holiday

1 = bank holiday only in Niedersachsen

2 = bank holiday only in Nordrhein-Westfalen

3 = bank holiday both in Niedersachsen and in Nordrhein-Westfalen

***************

weather_DWD.csv (weather data from Deutscher Wetterdienst)

***************

date

----

Date

air_humidity_DWD

----------------

Daily average air humidity

Date range: 2005–2010

Unit: %

Weather station: Osnabrück (latitude 52,2553, longitude 8,0534)

air_temperature_daily_max_DWD

-----------------------------

Daily maximum temperature

Date range: 2005–2010

Unit: Degrees Celsius

Weather station: Osnabrück (latitude 52,2553, longitude 8,0534)

air_temperature_daily_mean_DWD

------------------------------

Daily mean temperature

Date range: 2005–2010

Unit: Degrees Celsius

Weather station: Osnabrück (latitude 52,2553, longitude 8,0534)

air_temperature_daily_min_DWD

-----------------------------

Daily minimum temperature

Date range: 2005–2010

Unit: Degrees Celsius

Weather station: Osnabrück (latitude 52,2553, longitude 8,0534)

precipitation_DWD

-----------------

Daily precipitation

Date range: 2005–2014

Unit: Millimeter

Weather station: Osnabrück-Haste (latitude 52,304, longitude 8,044)

snow_height_DWD

---------------

Daily snow height

Date range: 2005–2014

Unit: Centimeter

Weather station: Osnabrück-Haste (latitude 52,304, longitude 8,044)

sunshine_hours_DWD

------------------

Daily sunshine hours

Date range: 2005–2010

Unit: Hours

Weather station: Osnabrück (latitude 52,2553, longitude 8,0534)

wind_speed_max_DWD

------------------

Daily maximum windspeed

Date range: 2005–2010

Unit: meter/second

Weather station: Osnabrück (latitude 52,2553, longitude 8,0534)

**************************

weather_uni_osnabrueck.csv (weather data from Uni Osnabrück)

**************************

Weather station: latitude 52,285341, longitude 8,024064

Weather station type: https://www.reinhardt-testsystem.de/deutsch/klima_sensoren/wetterstationen/wetterstation_mws_9_5.php

Date range: June 12, 2009–Dec 31, 2014

Raw data contains measurement every 5 minutes. Daily means calculated from 9:00-21:00 (roughly opening times of Nettebad)

date

----

Date

air_humidity_UniOS

------------------

Daily mean air humidity

Unit: %

air_pressure_UniOS

------------------

Daily mean air pressure

Unit: Hectopascal

global_solar_radiation_UniOS

----------------------------

Daily mean global solar radiation ("Globalstrahlung")

Unit: Watts per square meter

temperature_UniOS

-----------------

Daily mean temperature

Unit: Degrees Celsius

wind_speed_avg_UniOS

--------------------

Daily mean wind speed

Unit: meter/second

wind_speed_max_UniOS

--------------------

Daily maximum wind speed

Unit: meter/second

wind_direction_category_UniOS

-----------------------------

Daily mode of the wind direction

Categories: N, NW, W, SW, S, SE, E, NE