What we're about
Upcoming events (1)
Join us for another virtual BISH Bash meetup, hosted by Professor Bryan Pardo and the Interactive Audio Lab from Northwestern University. We will have 4 talks by 4 PhD students from the Interactive Audio Lab, followed by Q&A, and some networking over Zoom video conferencing. We hope to see you all there!
Agenda (Pacific Daylight Time UTC -7)
6:00 - 6:10pm - Welcome & introduction
6:10 - 6:30pm - Talk 1: Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
6:30 - 6:50pm - Talk 2: Adversarial Attacks in the Audio Domain with Adaptive Filtering
6:50 - 7:10pm - Talk 3: Deep Learning Tools for Audacity: Helping Researchers Expand the Artist’s Toolkit
7:10 - 7:30pm - Talk 4: Combining Pretrained Music Models for Unsupervised Source Separation and Style Transfer
7:30 - 7:35pm: Wrap-up & shoutouts
7:35 - 8:00pm: Breakout sessions with speakers & networking
Title: Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Speaker: Max Morrison
Abstract: Editing the pitch and timing of speech are fundamental operations for applications such as audio-visual synchronization and speech prosody editing. To date, neural networks have been unable to produce pitch-shifting or time-stretching of speech with subjective quality exceeding digital signal processing (DSP)-based methods. In this talk, we describe Controllable LPCNet, a modified LPCNet vocoder that can perform pitch-shifting and time-stretching of speech from unseen speakers and datasets with subjective quality as good or better than existing DSP-based methods.
Title: Adversarial Attacks in the Audio Domain with Adaptive Filtering
Speaker: Patrick O’Reilly
Abstract: While deep neural networks achieve state-of-the-art performance on many audio classification tasks, they are known to be vulnerable to adversarial examples - artificially-generated perturbations of natural instances that cause a network to make incorrect predictions. In this talk, I will discuss a novel audio-domain adversarial attack that modifies benign audio using an interpretable and differentiable parametric transformation – adaptive filtering. Unlike existing state-of-the-art attacks, this method does not require a complex optimization procedure or generative model, relying only on a simple variant of gradient descent to tune filter parameters. The talk will include a review of existing audio-domain adversarial attacks, focusing on ways in which attackers can use domain-specific techniques and the properties of human auditory perception to their advantage.
Title: Deep Learning Tools for Audacity: Helping Researchers Expand the Artist’s Toolkit
Speaker: Hugo Flores Garcia
Abstract: In this talk, provide a software framework that lets deep learning practitioners easily integrate their own PyTorch models into Audacity, a free and open-source DAW. This creates a pipeline for ML audio researchers and developers to put tools in the hands of artistic creators without the need to do DAW-specific development work, without having to learn how to create a VST plugin, and without having to maintain a server to deploy their models. We hope that this work fosters a new level of interactivity between deep learning practitioners and end-users.
Title: Combining Pretrained Music Models for Unsupervised Source Separation and Style Transfer
Speaker: Ethan Manilow
Abstract: This talk will focus on an unsupervised method that uses large, pretrained music models for audio-to-audio tasks–like source separation and style transfer–all without any retraining. Inspired by the popular VQGAN+CLIP combination for making generative visual art, I accomplish audio tasks by pairing OpenAI's Jukebox with a pretrained music tagger in a similar manner. I will showcase some fun and interesting results, contextualize this method within the rest of the literature, and discuss my excitement about the vast potential that lays relatively untapped in these large pretrained models.