Speech Recognition Israel 5 - ASR Systems in practice
Details
After a long break, in two weeks we are going to have our 5th Meetup!
UPDATE We are moving the meetup to the virtual space because of the rising Covid cases in Tel-Aviv. Please follow this zoom link at the event:
https://us02web.zoom.us/j/87220269572?pwd=YnBuNTliQ2JKQTdxRG1LSXZDNkJhdz09
While previous meetups focused on the latest ASR research, the goal of this meetup is to show how those systems are being built in practice.
We will be hosting 3 lectures.
Location: Yossef Karo 19, Tel Aviv. hosted by Chorus.ai
Time: July 5th, 2021. 18:00-20:00
Agenda:
18:00 - 18:15 - Starting the event
18:15 - 18:30 - Yoav Ramon - A short overview of new ASR libraries emerging today
18:30 - 19:15 - Raphael Cohen - Distant conversational speech recognition
19:15 - 19:20 - Break
19:20 - 20:00 - Tal Rosenwein - End-2-End ASR systems 101
Yoav Ramon, AI Lead @ Hi Auto - "A short overview of new ASR libraries emerging today"
While Kaldi still remains the most dominant ASR infrastructure used by small and medium companies, you can start seeing a shift toward a fully-pythonic research infrastructure for ASR and Speech processing. Currently, this shift stays mainly in the research industry and on some large corporations.
Yoav will give a short brief on the pros and cons of each of the new solutions.
Raphael Cohen, VP Research @ Chorus.ai - "Distant conversational speech recognition"
Natural business conversations recorded from video conferencing with a distant microphone are at the heart of Chorus.ai’s business as we are set to record and analyze 60M hours of customer-facing conversations this year. Users expect to see high-quality transcript including who said what, an issue with any part of this task will be considered a “bad transcript”. This task is comprised of multiple sub-tasks such as: speaker diarization without the number of speakers known, speaker identification, conversational speech recognition, and punctuation.
In this talk, Raphael will describe our approach for solving these issues, maintaining bleeding-edge quality over time, and some of the current work on solving these issues together.
Tal Rosenwein, VP R&D @OrCam - "End-2-End ASR systems 101"
Traditional ASR systems train the acoustic, pronunciation, and language models separately, each with a different objective. These systems are complicated, require a significant amount of task-specific knowledge, and rely on highly accurate human annotations. E2E models received much attention in recent years due to their SOTA performances while keeping the system relatively simple. Moreover, these models require audio-transcript pairs that are much easier to obtain compared to their traditional counterparts.
In this talk, Tal will go through the end2end training pipeline, list the pros and cons of such systems, and share some insights that we gained in years of R&D in OrCam after deploying SOTA E2E ASR models that were trained on a large scale (using >500K hours of raw training data).
The meetup recording will also be available after the event at our Facebook group: https://www.facebook.com/groups/461707137729175
