Skip to content

Easy Data Wrangling with DSS: From Scraping HTML To Unsupervised Learning in 1h

Photo of Ryan B. Harvey
Hosted By
Ryan B. H.
Easy Data Wrangling with DSS: From Scraping HTML To Unsupervised Learning in 1h

Details

Dataiku's Data Science Studio (DSS) (https://www.dataiku.com/dss/?ref=dwdc-meetup) makes data wrangling easy. During this talk, Henri will demonstrate how we can use DSS's powerful tools to create a complete workflow from raw data to training models in 1h.

We will start by scraping data science related job listings in Washington DC. Then, we will download all of the company reviews and try to make sense of where is the best place to work by cleaning and parsing raw html, and ultimately performing unsupervised learning to see what topics come up!

Finally we will use DSS's insight tool to create a web app using flask, html and javascript to explore the results.

Our Speaker

Henri Dwyer is a data scientist and engineer working at Dataiku on building the best platform for data scientists. He received an MSc in Engineering from Columbia University in New York City, and a BS and an Ms in Engineering from Ecole Polytechnique in Paris. He now lives in Brooklyn, and is always keen on discovering new data science problems to solve.

Photo of Data Engineers DC group
Data Engineers DC
See more events
GWU, Funger Hall, Room 103
2201 G St. NW · Washington, DC