June 7, 2013 · 12:30 PM
Speaker: Shane Lewin with John Clark
The field of machine learning has advanced to the point that out of the box solutions are sufficient to solve 99% of big data problems. Yet many solutions never see the light of day due to the high cost of acquiring quality data in volume. A number of products have emerged to meet this demand, ranging from algorithmic solutions to crowdsourcing to hybrid models, each with distinct advantages and limitations. In this talk I will survey some of the advances in data collection and discuss their strengths and limitations. Finally I will argue that smart systems, which combine data handling algorithms and machine learning in a self reinforcing system, represent the future and the biggest opportunity in data sciences in the next decade.
Shane has architected half a dozen smart systems, ranging from automated customer feedback to the web-scale machine learned ranker that powers all of the text you see on Bing. He currently managing a Data Science and Engineering team at Netflix, where he continues to be neurotically obsessed with data quality. http://www.linkedin.com/pub/shane-john-lewin/1/a82/bb9