How to store yourself and data -- from .csv to .parquet, from tent to fancy loft


Details
It is time for us to meet again in a new location with two new talks: Jonathan will share insights about Hamburg's housing market and Marco will give us an overview about how to store our data.
-
Level: beginner
-
Foods and drinks: yes
Agenda:
Part 1: Create your own HamburgHousing dataset
by Jonathan Niesel (Data Scientist @ Blue Yonder)
Its all over the media but there is not (too much) open data available: the housing market in Hamburg. Therefore you need to create your own dataset - of course with Python.
The talk will be about his DataScience@Home project: Create your own HamburgHousing dataset with Python. He will show how he created a dataset of currently up to 100.000 individual flats and houses in Hamburg with continuous web scraping over the past 10 months.
This talk will include basic introductions on how to set up an AWS machine, continuous web scraping and last but not least, an analysis of the scraped results.
Part 2: Hold My Data -- An introduction into data storages and transfer for Pandas users
by Marco Neumann (Data Architect Int @ Blue Yonder)
Did you ever feel overwhelmed by the sheer inifinite amount of data storage possibilities? If yes, then this talk is perfect for you: sit back, relax, and let Marco guide you through some of the available options.
- good old CSV files
- Excel (¯_(ツ)_/¯)
- "just use a DB"
- shiny Apache Arrow
- compressed Parquet
- going Big On Blobs
In the end, you will know which type of data storage will suit your next Python project.

How to store yourself and data -- from .csv to .parquet, from tent to fancy loft