Building data science project from scratch [talk + workshop]

![Building data science project from scratch [talk + workshop]](https://secure.meetupstatic.com/photos/event/9/1/5/3/highres_472957203.jpeg?w=750)
Details
In this session we will go back to our usual format, talk and hands on session. So bring your laptops and good mood to learn about data science and python. We want to have coaches to support the hands on session. If you can be one of them please write us in slack.
• Speaker: Jekaterina Kokatjuhha
Jekaterina is a Research Engineer at Zalando, focusing on scalable machine learning for fraud prediction. She obtained a masters degree in bioinformatics from FU Berlin and worked in various research institutions across Europe such as the Charité Hospital in Berlin, the Centre for Genomics Regulations in Barcelona and at Manchester University. Jekaterina is excited about Machine Learning and Data Science and involved in a couple of ML side projects.
• Talk
Building data science project from scratch: analysis of Berlin rental prices
This talk is about how to design a good data science project from scratch based on a real world dataset. As a showcase project we analyze the rental prices for apartments in Berlin. This talk will guide you through all the steps of a short-term data science project: motivation, extraction of data from the web, cleaning and engineering of features using external APIs, storytelling, and building machine learning models. We will dive into the pitfalls and design patterns when scraping data from the web. The importance of the interactive dashboards should not be understated as they help you find useful insights on your own. We will apply the human judgment of the apartment’s address to engineer new features using google API and use correlated features to impute the feature of interest. In the end several machine learning models will be used to explore the idea of bagging and of stacked models.
• Workshop
In the workshop we will go throw extensive data cleaning, namely exploring how to find "hidden" duplicated records. We scrap Wikipedia to get the list of Berlin metro stations, use google API to encode address features.
• Installation requirements
- To get familiar with html and web scraping read this blog post (https://hackernoon.com/web-scraping-tutorial-with-python-tips-and-tricks-db070e70e071)
- Installed Python3 and following libraries: pandas, numpy, requests, re, bs4 (if you are new into installing packages in python, read this for linux or macOS - https://packaging.python.org/tutorials/installing-packages/ for windows please ask in slack)
- Installed and working Jupyter notebook. Instructions on how to install Jupiter notebook (http://jupyter.readthedocs.io/en/latest/install.html#id4). Please, make sure that you can open Jupiter Notebook and import pandas.
- If you don't have a gmail account, please create one.
Two more requirements: - Read https://developers.google.com/maps/documentation/distance-matrix/start
- One day before the meetup they should download following repository https://github.com/jkokatjuhha/workshop_berlin_rental_apartment
• Miscellaneous
Snacks and drinks will be available
• Gender policy [UPDATED]
We believe knowledge is for all and at the same time our events aim primarily to empower women tech community. We request non female attendees to be aware of these situation and make their presence discrete. Eg. by coming with a female plus one to ensure gender balance, avoiding to be heard more than the rest of the attendees in discussions and question sections.
• Photography / video consent
We take photos and videos during the event to use for documentation and in social media such as here in Photo albums, Facebook, Twitter, etc. By coming to the meetup, you willingly give consent to take photos and videos of you. If you do not want to give your consent, please let us know at check-in or change your RSVP to 'Not Going'.
• Contact
Interested in speaking at one of our events? Have a good idea for a Meetup? Get in touch with us at berlinpyladies@gmail.com
You can also find us on slack
Invite: https://pyladies-berlin.herokuapp.com/
Slack: https://pyladies-berlin.slack.com

Sponsoren
Building data science project from scratch [talk + workshop]