Incorporating New Knowledge Into LMs & Building a Domain-Specific Search Engine
On September 29th, we will have our first hybrid meetup, which you can join remotely via Zoom or in person in Berlin on the AI Campus! Two great speakers have agreed to give presentations on "Incorporating New Knowledge Into Language Models" & "Building a Domain-Specific Search Engine": Nils Reimers from co:here and Matthias Richter from ML6! 🎉
The talks will start at 7pm. After the talks there will be small Zoom breakout rooms to connect and discuss. To coordinate the registration of remote participants and on-site participants, there are two separate meetup pages. This page here is for the registration of remote participants. A separate page is for the registration of on-site participants. There, admission begins at 6:30pm.
Save the date and register now either here to join remotely via Zoom or on the other page to join in person in Berlin! Looking forward to meeting you!
Incorporating New Knowledge Into Language Models
by Nils Reimers from co:here
Language models work well for many NLP tasks, but they have one big weakness: Each day passing since they have been pre-trained/fine-tuned, their knowledge becomes more and more obsolete. For example, the BERT model still thinks that Barack Obama is the current US president. Especially in semantic search this is a big issue, as we often search for the most recent events. In this talk, I will give an overview how to include new knowledge into language models like BERT with a special focus on search. I will then present Generative Pseudo Labeling (GPL), an efficient method to adapt semantic search models to new domains & datasets.
Building a Domain-Specific Search Engine
by Matthias Richter from ML6
Semantic search engines enjoy more and more attention and at ML6 we deal with a lot of different domains and datasets. In this talk, I will give some insights into practical use cases where semantic search beats classical lexical-based search engines. The latest version of the Haystack framework already integrated an implementation of Generative Pseudo Labeling (GPL). I will demonstrate how you can easily use GPL to adapt a dense retriever to any domain-specific dataset and build a semantic search engine on top. To this end, I will showcase a small demo that compares the results of different search approaches.