Streaming Recommendations and Model Factories
Rather overdue, but here it is: the next Applied Machine Learning Meetup!
This event is kindly hosted by GoDataDriven (http://www.godatadriven.com/careers.html), food and drinks are sponsored by KPN (https://www.kpn.com/).
• 18.00: Arrive, socialise, have a drink and eat
• 18.50: Short introduction by your humble organizers
• 19.00: Talk 1, by Anne Schuth
• 19.45: short break
• 20.00: Talk 2, by Chris Molanus and Maria Vechtomova
Every morning, at Blendle, we have a huge cold-start problem when over 6.000 new articles from the latest newspapers arrive in our system. These articles are read by virtually no-one yet when we are tasked with sending out personalised newsletters to many of our users. We can thus not rely on collaborative filtering type of recommendations, nor can we use the popularity of the articles as clues for what our user might want to read. We overcome our cold-start problem by a mix of curation by our editorial team and an automated analysis of the content of these articles. We extract named entities, semantic links, authors, the language and plenty of stylometrics. Much of our setup to analyse content is implemented in Spark, as a (mini) batch process. And the `batch` part is (or better, was) a problem. Our editorial team gets up at around 5am and is done reading and recommending their selection of articles around 8am, which is also the time we would ideally send out the newsletter. Starting our batch process only then would mean a prohibitively long delay. We therefore started switching to a combination of Spark with a streaming infrastructure with Kafka at the core. In this talk I will outline both our batch processing setup and our streaming setup and how these work together.
Anne Schuth is data scientist at Blendle, where you can read all newspapers and magazines and only pay for what you read. Anne recently obtained his PhD from the University of Amsterdam (UvA). His PhD research focused on online learning to rank: optimizing search engine algorithms based on the interactions with users. Anne was previously intern at Microsoft Research in Cambridge and Yandex in Moscow.
Model Factory: production platform for running and monitoring the models.
One of the main missing concepts in a data-driven business is a stable platform for production models. Model factory provides a framework where production models can be easily maintained and monitored. Because the platform was build with a data scientist in mind, it removes the hurdles which commonly occur while moving model from development to production. The platform is build using open source software (e.g., GIT, Jenkins, R) . We will explain how to combine these technologies to build your own model factory.
Chris Molanus/Maria Vechtomova:
Chris is a data scientist at KPN with 3 years experience in data science and background in computer science and network engineering. Maria is a data scientist at KPN with 1 year experience in data science and 2 years experience as business analyst and background in econometrics.