Musical Scene Analysis and Generative Modeling | Ethan Manilow


Details
London Audio & Music AI Meetup (virtual) - 23 Mar 2022 @ 18.30 (UK time)
We would like to invite you to our Audio & Music AI Meetup.
Featuring Ethan Manilow, PhD candidate at Northwestern University, presenting "Musical Scene Analysis and Generative Modeling".
*Agenda:
- 18:25: Virtual doors open
- 18:30: Talk
- 19:15: Q&A
- 19:30: Networking
- 20:30: Close
*Abstract
In this talk, I will provide an overview of some of my recent work on the automatic analysis of musical scenes and generative modeling for musical instrument sounds. I will make the argument that these two areas, Musical Scene Analysis and Generative Modeling for musical audio, are mutually beneficial. I will demonstrate this by showcasing projects that blur the line between these two domains. First, I will discuss MIDI-DDSP, an interpretable hierarchical model that enables realistic neural audio generation from MIDI input. Then turning toward Automatic Music Transcription, I will show how adopting the Transformer architecture leads to more flexible and performant multi-instrument transcription systems. Finally, I will present two approaches for source separation that lean heavily on ideas from generative modeling. The first is TagBox, a system that combines OpenAI’s JukeBox and a pre-trained Music Tagger to perform source separation without retraining either model. The second is a separation method that models the relationship between sources autoregressively, built on the assumption that musical sources are interdependent. I hope these projects will highlight the role that Musical Scene Analysis and Generative Modeling can play as we develop the next generation of tools for musical expression.
Relevant papers:
- MIDI-DDSP: https://arxiv.org/abs/2112.09312
- MT3: https://arxiv.org/abs/2111.03017
- TagBox: https://interactiveaudiolab.github.io/assets/papers/tagbox_icassp_V2-1.pdf
- NADE Separation: https://interactiveaudiolab.github.io/assets/papers/genss_icassp2022_cr.pdf
*Bio
Ethan Manilow is a final-year PhD Candidate in the Interactive Audio Lab at Northwestern University, studying under Prof. Bryan Pardo. He also currently works as a Student Researcher on the Magenta Team at Google Brain. His research centers on how to make Machine Listening systems that identify and locate the key aspects of music. Previously, he was a research intern in the Speech and Audio Group at Mitsubishi Electric Research Labs (MERL) in Cambridge, MA. He obtained a BS in Physics and a BFA in Jazz Studies (studying Guitar) from the University of Michigan. These days he lives in Chicago, IL, where he fingerpicks his acoustic guitar and smiles at every dog he passes on the sidewalk.
*Follow Ethan
https://twitter.com/ethanmanilow
*Host
Kobalt Music: https://www.kobaltmusic.com/
*Sponsors
IEEE Signal Processing Society. http://www.signalprocessingsociety.org/

Musical Scene Analysis and Generative Modeling | Ethan Manilow