Building AI models for sound recognition with less annotation effort


Details
Agenda
6:30 – 6:40 Intro
6:40 – 7:40 Building AI models for sound recognition with less
annotation effort
7:40 – 7:55 Q&A
7:55 – 8:00 Adjourn
Abstract
Sound is one of the most important mediums to understand the environment around us. Identifying a sound event (such as a police siren, a dog bark, or a creaking door) leads to a better understanding of the context where the sound events occurred. An AI system that automatically recognizes sound events can understand our environment through sound as humans do. However, building a sound recognition model typically requires a large amount of carefully labeled audio data. A human annotator needs to listen to a lengthy audio recording and add text labels with their temporal information (onset and offset of an event) within a recording.
In this talk, Bongjun will introduce his past research works presenting ways to reduce such human effort on sound event annotation. First, He will talk about deep learning models for sound event detection and classification that can be trained on less accurately annotated data which takes less time to collect. Secondly, He will also introduce a human-in-the-loop system for sound event annotation which helps human annotators collect sound events of interest quickly.
Bio:
Bongjun is a data scientist and member of AI Lab at 3M in Minnesota. He completed his PhD in Computer Science at Northwestern University as a member of the Interactive Audio Lab. His research interests include machine learning, audio signal processing (e.g., sound event recognition), intelligent interactive system, multimedia information retrieval, and human-in-the-loop interface. He also enjoys working on a musical interface and interactive media art.
Website: https://www.bongjunkim.com/

Building AI models for sound recognition with less annotation effort