A Scalable Scraping Architecture [Eddie Bell, Lyst.com]

![A Scalable Scraping Architecture [Eddie Bell, Lyst.com]](https://secure.meetupstatic.com/photos/event/4/6/d/e/highres_189258142.jpeg?w=750)
Details
Web scraping is an integral part of data acquisition at Lyst (http://www.lyst.com/). Almost all fashion products sold on our site come from scraping. We run hundreds of spiders in parallel via a distributed scheduling platform and scrape millions of pages each day. One of our main problems is that the data from scraping is not reliable. In this talk I will explain how we built a robust and scalable scraping architecture with the help of machine learning and crowd sourcing.
Dr Edward John Lancaster Bell III (https://twitter.com/ejlbell) (Eddie) is an ex-finance PhD who saw the light and joined a start-up. He is the lead data scientist (aka "The Fashematician") at Lyst and he solves fashion data problems using NLP, ML and image processing. He likes describing himself in the third person and long walks on the beach.

A Scalable Scraping Architecture [Eddie Bell, Lyst.com]