SEA: Speech

Name: SEA: Speech
Start: 2023-04-21T17:00:00+02:00
End: 2023-04-21T18:00:00+02:00

Georganiseerd door ali v.

SEA: Search Engines Amsterdam

Details

In this edition of SEA (brought forward one week because of the holidays) we will discuss speech recognition and synthesis. We have two amazing speakers lined up: Tom Kenter (Google) and Maurits Bleeker (University of Amsterdam).
This will be a hybrid event, the in-person event will take place at Lab42, Science Park, room L3.33.
***
IMPORTANT: You will be able to view the Zoom link once you 'attend' the meetup on this page.
***
17.00: Tom Kenter (Google)
Title: Improving Speech Synthesis by Leveraging Pretrained Language Models
Abstract: When automatically generating speech audio from textual input, it is important to get the prosody right — where, by prosody, we mean phenomena such as intonation, timing, and stress. Syntactic and semantic information about the text to be synthesized can help text-to-speech models to generate speech with natural prosody. In this talk, we will discuss a way of leveraging BERT models to provide speech synthesis models with syntactic and semantic information present in text-only pre-training data.

***
***
17.30: Maurits Bleeker (University of Amsterdam)
Title: Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
Abstract: In this talk, I will present an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder.
During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative examples in the context list alongside random and ground truth contextual information. By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases.
This improves biasing accuracy when there are several similar phrases in the biasing inventory. We carry out experiments in a large-scale data regime obtaining up to 7% relative word error rate reductions for the contextual portion of test data. We also extend and evaluateCATT approach in streaming applications.

Just keep counting: SEA talks #240 and #241.

Gerelateerde onderwerpen

Information Architecture

Science

Apache Solr

Elasticsearch

Technology

SEA: Speech

SEA: Search Engines Amsterdam

Details

Gerelateerde onderwerpen

Misschien vind je dit ook leuk