Skip to content

From research to production: scaling a state-of-the-art machine learning system

Photo of Kristian Aune
Hosted By
Kristian A.
From research to production: scaling a state-of-the-art machine learning system

Details

There is a significant knowledge gap in the machine learning industry between research and bringing applications to production. It is not trivial to go from a Jupyter notebook to serving live traffic for more complex applications. In this talk, we introduce an application designed to answer questions given in plain text. Typical for many research systems, it initially consists of multiple independent Python scripts, tools, and models. We’ll implement a production-ready application on Vespa that can scale to pretty much any desired level.

During the talk, we’ll give a high-level overview of how such a retrieval-based question-answering system works. This includes classic information retrieval (BM25), modern retrieval using approximate nearest neighbor search (ANN), and natural language understanding models based on Transformers such as BERT. We’ll introduce Vespa, the open-source big data serving engine, and show how all this can all be implemented and scaled using various techniques such as distillation and quantization.

After this talk, you will have learned what it takes to build a real-time serving system using the latest AI models in production.

Agenda:
18:00 - 19:00 Background: information retrieval, approximate nearest neighbors, representation learning, transformers.

19:15 - 20:00 Implementation: introduction to Vespa, inference in production vs training, distillation, quantization. Demonstration.

---

Search or information retrieval is going through a paradigm shift, some have even called it the “BERT revolution”. The introduction of pre-trained language models like BERT have led to significant advancement of the state of the art in search and document ranking. Objective advancements, as measured on large information retrieval benchmarks. Also large organizations like Google and Microsoft have confirmed using BERT for search and question answering.

The BERT revolution has created a significant knowledge and technology gap in the industry and among industry practitioners. The difference in how large organizations deploy search and information retrieval systems compared to smaller organizations without the knowledge or access to the technology has undoubtedly increased since Transformer models made their way.

The meetup will cover:

  • An introduction to the open source Vespa serving engine and the scale we operate Vespa at in Yahoo.

  • Quick introduction to baseline search and classic information retrieval techniques. Techniques also available in popular search engines like Apache Solr and Elasticsearch built on the Apache Lucene search library.

  • Pre-trained Language models of the Transformer family, inference run time complexity discussion and overall train versus inference (serving) process. Scaling model inference by techniques like distillation and quantization.

  • 3 ways to represent and use transformer models for search and document ranking with Vespa. Most importantly, how not to use pre-trained language models. There are pitfalls which we commonly observe in the industry.

  • Scaling of production systems using Transformers, scaling with query traffic, data volume, indexing freshness and end to end serving latency.

  • Approximate nearest neighbor search in dense vector spaces using Vespa which enables new and interesting AI serving use cases. Use cases beyond unstructured text data but also image, video, audio and multi-modal models combining multiple data formats into a shared dense vector embedding space.

  • Finally, we demonstrate with two demos of applications which are released and open sourced as Vespa sample applications:

** State of the art text ranking using the MS Marco Passage ranking dataset which is the largest information retrieval dataset where one objectively can compare different retrieval and ranking algorithms.

** State of the art open domain question answering, from 10 blue links to answering questions.

Photo of Trondheim Big Data group
Trondheim Big Data
See more events
Yahoo! Technologies Norway AS
Prinsens gt. 49 · Trondheim