This month we have two presenters each talking about their papers from KDD. Troy Raeder will be presenting "Design Principles of Massive, Robust Prediction Systems" and Claudia Perlich will be presenting "Bid Optimizing and Inventory Scoring in Targeted OnlineAdvertising". Abstracts and bios below.
Design Principles of Massive, Robust Prediction Systems
Most data mining research is concerned with building high-quality classification models in isolation. In massive production systems, however, the ability to monitor and maintain performance over time while growing in size and scope is equally important. Many external factors may degrade classification performance including changes in data distribution, noise or bias in the source data, and the evolution of the system itself. A well-functioning system must gracefully handle all of these. This paper lays out a set of design principles for large-scale autonomous data mining systems and then demonstrates our application of these principles within the m6d automated ad targeting system. We demonstrate a comprehensive set of quality control processes that allow us monitor and maintain thousands of distinct classification models automatically, and to add new models, take on new data, and correct poorly-performing models without manual intervention or system disruption.
Bid Optimizing and Inventory Scoring in Targeted OnlineAdvertising
Billions of online display advertising spots are purchased on a daily basis through real time bidding exchanges (RTBs). Advertising companies bid for these spots on behalf of a company or brand in order to purchase these spots to display banner advertisements. These bidding decisions must be made in fractions of a second after the potential purchaser is informed of what location (Internet site) has a spot available and who would see the advertisement. The entire transaction must be completed in near real-time to avoid delays loading the page and maintain a good users experience. This paper presents a bid-optimization approach that is implemented in production at Media6Degrees for bidding on these advertising opportunities at an appropriate price. The approach combines several supervised learning algorithms, as well as second price auction theory, to determine the correct price to ensure that the right message is delivered to the right person, at the right time.
Troy Raeder Bio:
Troy Raeder has earned B.S., M.S., and Ph.D degrees in Computer Science from the University of Notre Dame and is currently a Data Scientist at M6D. He has published academic articles in a number of venues including Pattern Recognition and the Journal of Machine Learning Research. His current research interests include machine learning for online advertising, learning under shifting distributions, and the development large-scale machine learning algorithms and systems.
Claudia Perlich Bio:
Since 2010, Claudia Perlich holds the position of chief scientist at Media6Degrees, a startup that specializes at targeted online display advertising. Claudia received her Ph.D. in Information Systems from Stern School of Business, New York University in 2005 and holds additional graduate degrees in Computer Science. Claudia joined the Data Analytics Research group at the IBM T.J. Watson Research Center in 2004 and continued her research on data analytics and machine learning for complex real-world domains and applications. She is the author or 50+ scientific publications and holds multiple patents in the area of machine learning, has won various data mining competitions, best paper awards, and speaks regularly at conferences and other public events.