Voices In, Voices Out - ASR, TTS and How Voice AI is Becoming Accessible.

Name: Voices In, Voices Out - ASR, TTS and How Voice AI is Becoming Accessible.
Start: 2023-01-19T19:00:00+08:00
End: 2023-01-19T20:30:00+08:00
Location: Google Developers Space, Singapore

Hosted by Sam W. and Martin A.

Machine Learning Singapore

Details

Over the next few months we plan to do a series of events covering some of the latest announcements, models and research, and this month's in-person event is kicking it off covering some of the most recent advances in models for voices, specifically Automated Speech Recognition (ASR) and Text To Speech generation (TTS). Looking forward to seeing everyone just before Chinese New Year!

Talks:

New Frontiers in TTS - Martin Andrews :

Microsoft's VALL-E is a new entrant in the race to getting TTS systems across the uncanny valley. Martin will briefly outline how TTS systems in the past have been constructed, and then explain how the newest transformer-based systems (including Google's AudioLM) are tackling the problem by including unsupervised learning.

OpenAI's Whisper and adding to it for a production ASR system. - Sam Witteveen

OpenAI's Whisper model has been out for a few months now and has proven to be a winner for cheap high quality ASR. Sam will talk about how the model works, additions that can be made such as speaker diarization, generating accurate time stamps serving the model in a production app.

Riffusion lightning talk - Rishabh Anand

Stable Diffusion has taken the world by storm, making it easier to generate images (now, video!) through text. What if you could do the same for audio? By first formulating audio generation as image generation and then "decoding" this audio image, Riffusion pieces together an audio sample representative of the text prompts simply through finetuning. Rishabh briefly explains how Stable Diffusion is repurposed for audio generation, and the different settings in which you can use Riffusion off the shelf.

Talks will start at 7:00pm and end at around 8:30pm, at which point people normally come up to the front for a bit of a chat with each other, and the speakers.

***

As always, we're actively looking for more speakers - both '30 minutes long-form', and lightning talks. For the lightning talks, we welcome folks to come and talk about something cool they've done with TensorFlow, PyTorch, JAX and/or Deep Learning for 5-10mins (so, if you have slides, then #max=10). We believe that the key ingredient for the success of a Lightning Talk is simply the cool/interesting factor. It doesn't matter whether you're an expert or and enthusiastic beginner: Given the responses we have had, we're sure there are lots of people who would be interested to hear what you've been playing with. Please suggest yourself here :
https://www.meetup.com/Machine-Learning-Singapore/suggestion/

Machine Learning Singapore

Voices In, Voices Out - ASR, TTS and How Voice AI is Becoming Accessible.

Machine Learning Singapore

Details

Related topics

You may also like