How to store yourself and data -- from .csv to .parquet, from tent to fancy loft

This is a past event

42 people went

Blue Yonder GmbH

Heidenkampsweg 45 · Hamburg

How to find us

5 minute walk from Hammerbrook (S-Bahn) or Berliner Tor (S-Bahn and U-Bahn)

Location image of event venue


It is time for us to meet again in a new location with two new talks: Jonathan will share insights about Hamburg's housing market and Marco will give us an overview about how to store our data.

* Level: beginner

* Foods and drinks: yes


Part 1: Create your own HamburgHousing dataset
by Jonathan Niesel (Data Scientist @ Blue Yonder)

Its all over the media but there is not (too much) open data available: the housing market in Hamburg. Therefore you need to create your own dataset - of course with Python.
The talk will be about his DataScience@Home project: Create your own HamburgHousing dataset with Python. He will show how he created a dataset of currently up to[masked] individual flats and houses in Hamburg with continuous web scraping over the past 10 months.
This talk will include basic introductions on how to set up an AWS machine, continuous web scraping and last but not least, an analysis of the scraped results.

Part 2: Hold My Data -- An introduction into data storages and transfer for Pandas users
by Marco Neumann (Data Architect Int @ Blue Yonder)

Did you ever feel overwhelmed by the sheer inifinite amount of data storage possibilities? If yes, then this talk is perfect for you: sit back, relax, and let Marco guide you through some of the available options.
- good old CSV files
- Excel (¯\_(ツ)_/¯)
- "just use a DB"
- shiny Apache Arrow
- compressed Parquet
- going Big On Blobs
In the end, you will know which type of data storage will suit your next Python project.