Skip to content

[PDG 445] NV-Embed: Techniques for Training LLMs as Generalist Embedding Models

Photo of DavidFarago
Hosted By
DavidFarago
[PDG 445] NV-Embed: Techniques for Training LLMs as Generalist Embedding Models

Details

Link to article: https://openreview.net/pdf?id=lgsyLSsDRe
Title: NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Content: NV-Embed is a new embedding model based on decoder-only large language models that outperforms traditional BERT and T5-based embedding models for text embedding tasks. The model introduces a latent attention layer for creating pooled embeddings, which performs better than standard approaches like mean pooling or using the last token. The training uses a two-stage method: first focusing on retrieval tasks with contrastive learning, then incorporating non-retrieval tasks to improve overall performance. Key training innovations include removing the causal attention mask during contrastive training and using hard negative examples and synthetic data to enhance learning. NV-Embed achieved the top position on the MTEB benchmark (covering 56 embedding tasks) and excelled on the AIR benchmark, demonstrating its effectiveness across diverse embedding applications.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.

In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.

Photo of AI Paper Discussion Group Karlsruhe group
AI Paper Discussion Group Karlsruhe
See more events