Searching to be entertained and improving word embedding compositionality


This Friday we'll have two talks followed by drinks. Our industrial speaker is Daan Odijk, the lead data scientist at RTL. He will talk about searching to be entertained at RTL. Our academic speaker is Thijs Scheepers, recent MSc. AI graduate from the University of Amsterdam. He will talk about improving word embedding compositionality using lexicographic definitions.


16:00-16:30 Daan Odijk

16:30-17:00 Thijs Scheepers

17:00-18:00 Drinks and snacks


Daan Odijk - Searching to be entertained

As the largest commercial broadcaster in a declining Dutch TV market, RTL is making a transition from a traditional TV company to a consumer-focused media company. RTL is embracing a closer relationship and more direct interaction with its viewers, followers and visitors, via every conceivable means. The new growth strategy is centered around consumers’ needs and motivations and brings investments in data analytics, technology and creation. RTL intends to genuinely touch and inspire people both on and off screen and thereby develop the loyal fan base for its brands. In this talk, I will share how we are using search technology to help our users find the right content for them, ranging from the 1M daily visitors on our news website to the over 2B video plays we had in 2017, most of these on our rapidly growing video-on-demand platform Videoland.

Bio: Daan Odijk is the lead data scientist at RTL. In 2016, he obtained his PhD from the University of Amsterdam, researching search algorithms for news. Subsequently, he joined journalism start-up Blendle to lead the personalization team. At RTL since 2018, Daan leads a team of a dozen data scientists and engineers, delivering data-powered products across RTL, including personalization for RTL Nieuws and Videoland.


Thijs Scheepers - Improving Word Embedding Compositionality using Lexicographic Definitions

We present an in-depth analysis of four popular word embeddings (Word2Vec, GloVe, fastText and Paragram) in terms of their semantic compositionality. In addition, we propose a method to tune these embeddings towards better compositionality. We find that training the existing embeddings to compose lexicographic definitions improves their performance in this task significantly, while also getting similar or better performance in both word similarity and sentence embedding evaluations. Our method tunes word embeddings using a simple neural network architecture with definitions and lemmas from WordNet. Since dictionary definitions are semantically similar to their associated lemmas, they are the ideal candidate for our tuning method, as well as evaluating for compositionality. Our architecture allows for the embeddings to be composed using simple arithmetic operations, which makes these embeddings specifically suitable for production applications such as web search and data mining. We also explore more elaborate and involved compositional models. In our analysis, we evaluate original embeddings, as well as tuned embeddings, using existing word similarity and sentence embedding evaluation methods. Aside from these evaluation methods used in related work, we also evaluate embeddings using a ranking method which tests composed vectors using the lexicographic definitions already mentioned. In contrast to other evaluation methods, ours is not invariant to the magnitude of the embedding vector, which we show is important for composition. We consider this new evaluation method, called CompVecEval, to be a key contribution.

Bio: Thijs is an artificial intelligence and software geek based in Amsterdam, the Netherlands. Back in 2010, he co-founded a company which would turn into Label305, a digital product consultancy. In 2017 he graduated from the UvA with a MSc in Artificial Intelligence. One of his current responsibilities is the product management Label305's internally developed SaaS-product: Keeping.