Data Wrangling with DSS: From Scraping HTML To Unsupervised Learning in 1h

NYC Open Data
NYC Open Data
Groupe public
Image du lieu de l'événement


Co-host with Dataiku DSS NYC User Group Meetup (


Please RSVP via


Join us for pizza, beer and a data wrangling demo! During this talk, Henri will demonstrate how we can use Data Science Studio (DSS) ( to create a complete workflow from raw data to training models in 1h.

We will start by scraping data science related job listings in New York. Then, we will download all of the company reviews and try to make sense of where is the best place to work by cleaning and parsing raw html, and ultimately performing unsupervised learning to see what topics come up!

Finally, we will use DSS's insight tool to create a web app using flask, html and javascript to explore the results.

Our Speaker

Henri Dwyer is a data scientist and engineer working at Dataiku on building the best platform for data scientists. He received an MSc in Engineering from Columbia University in New York City, and a BS and an Ms in Engineering from Ecole Polytechnique in Paris. He now lives in Brooklyn, and is always keen on discovering new data science problems to solve.

*No prior knowledge of Dataiku Data Science Studio is required*

6:00-6:15- Food & Mingling

6:15-7:15- Henri Talk

7:15-8:00- Q&A & Mingling