Skip to content

Details

The April edition of SEA will cover the use of LLMs for data augmentation to improve and evaluate search engines. We will host two amazing speakers: Arthur Câmara, a research engineer at Zeta Alpha, and Arian Askari, a Ph.D. candidate from Leiden University. SEA is a hybrid event. The in-person event takes place at Lab42, Science Park, room L3.36.

***
IMPORTANT: You can view the Zoom link once you 'attend' the meetup on this page.
***

Speaker: Arthur Câmara (Zeta Alpha)
Title: Pushing the R in RAG: LLMs for finetuning with synthetic data, evaluation and agentic search
Time: 17:00
Abstract: LLMs are infiltrating every aspect of the traditional search pipeline, with RAG being the most prominent use case. With InPars we have created a toolkit to use LLMs to create synthetic data for finetuning the neural search in RAG to specific domains and customer data sets. With RAGElo, we extend this idea to the use of LLMs for the evaluation of search results. In recent work, we push the R in RAG to the extreme, building high-recall search agents that can dynamically evaluate retrieved results in real-time and refine their search strategies.

SEA Talk #266

Speaker: Arian Askari (Leiden University)
Title: Synthetic Document Generation for Passage Reranking using Large Language Models
Time: 17:30
Abstract: Generating synthetic training data based on large language models (LLMs) for ranking models has gained attention recently. While previous studies have utilized LLMs to build pseudo query-document pairs, our approach takes a new perspective. We harness the capabilities of LLMs to generate synthetic documents from queries. In this talk, I will present our recent works on data augmentation for passage reranking, including "A Test Collection of Synthetic Documents for Training Rankers: ChatGPT vs. Human Experts" published at CIKM 2023, and "Expand, Highlight, Generate: RL-driven Document Generation for Passage Reranking" published at the main track of EMNLP 2023. Our exploration encompasses both commercial LLMs, such as ChatGPT, and open-source counterparts. Our RL-guided few-shot synthetic document generation method outperforms state-of-the-art data augmentation methods.

SEA Talk #267

Related topics

Information Architecture
Science
Apache Solr
Elasticsearch
Technology

You may also like