Machine Learning Advances in Speech Synthesis

Details
Welcome back from summer! Join us for the 1st meetup of the fall to discuss recent advances in speech synthesis (artificial generation of human speech) using machine learning. I'll be leading an interactive exploration of the most popular speech and audio synthesis technologies and approaches with demos of them in action and details of how they work and compare to each other and previous work in the field.
Some of the projects we'll review are:
- Apple's Siri deep learning voice generation (https://machinelearning.apple.com/2017/08/06/siri-voices.html)
- Google Tacotron (https://google.github.io/tacotron/)
- Deepmind WaveNet (https://deepmind.com/blog/wavenet-generative-model-raw-audio/)
- Baidu Deep Voice (http://research.baidu.com/deep-voice-production-quality-text-speech-system-constructed-entirely-deep-neural-networks/)
- Baidu Deep Voice 2 (http://research.baidu.com/deep-voice-2-multi-speaker-neural-text-speech/)
We'll also dabble in how to use TensorFlow and PyTorch to explore these approaches.
As time allows we will also share experiences with commercial offerings like Amazon Polly text-to-speech service (https://aws.amazon.com/polly/) and IBM Watson Text to Speech service (https://www.ibm.com/watson/services/text-to-speech/).
We will also spend a few moments discussing plans for future meetups and get input from everyone on the most desired focus areas.
We thank Matthew Lean and OneHudson Ventures for letting us use the Estuari coworking space.


Machine Learning Advances in Speech Synthesis