[PDG 445] NV-Embed: Techniques for Training LLMs as Generalist Embedding Models

![[PDG 445] NV-Embed: Techniques for Training LLMs as Generalist Embedding Models](https://secure.meetupstatic.com/photos/event/c/6/4/6/highres_529130758.webp?w=750)
Details
Link to article: https://openreview.net/pdf?id=lgsyLSsDRe
Title: NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Content: NV-Embed is a new embedding model based on decoder-only large language models that outperforms traditional BERT and T5-based embedding models for text embedding tasks. The model introduces a latent attention layer for creating pooled embeddings, which performs better than standard approaches like mean pooling or using the last token. The training uses a two-stage method: first focusing on retrieval tasks with contrastive learning, then incorporating non-retrieval tasks to improve overall performance. Key training innovations include removing the causal attention mask during contrastive training and using hard negative examples and synthetic data to enhance learning. NV-Embed achieved the top position on the MTEB benchmark (covering 56 embedding tasks) and excelled on the AIR benchmark, demonstrating its effectiveness across diverse embedding applications.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.
In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.

[PDG 445] NV-Embed: Techniques for Training LLMs as Generalist Embedding Models