Improving performance of NLP Encoder Models
Details
Improving performance of NLP Encoder Models
Vladimir Ageev
ML Engineer with focus on Natural Language Processing
www.linkedin.com/in/vladimir-ageev-ds
This talk will explore techniques for accelerating the inference of NLP models. It might be interesting to specialists working on retrieval-related tasks, such as text search, recommendations, or Retrieval-Augmented Generation (RAG), who are looking to optimize inference speed on GPUs or CPUs.
