Skip to content

Understanding urban scenes through sound and video | Magdalena Fuentes

Photo of Sertan Senturk
Hosted By
Sertan S. and 3 others
Understanding urban scenes through sound and video | Magdalena Fuentes

Details

London Audio & Music AI Meetup (virtual) - 4 May 2022 @ 18.30 (BST)

We would like to invite you to our Audio & Music AI Meetup.
Featuring Magdalena Fuentes (Postdoctoral Faculty Fellow at New York University), presenting "Understanding audio-visual urban scenes through sound and video."

Agenda:

  • 18:25: Virtual doors open
  • 18:30: Talk
  • 19:15: Q&A
  • 19:30: Networking
  • 20:30: Close

Abstract
I will talk about my recent work on automatically understanding audio-visual urban scenes through sound and video. I will start by discussing the challenges of this area as well as its many potential applications, such as assistive devices for the hearing-impaired, the quantification of traffic for policy making or autonomous driving. Then, I will argue how audio-visual information is beneficial for the understanding of real-world scenes, as visual and acoustic modalities provide complementary information, e.g. images help identify sources and understand their motion, audio helps determine the presence of relevant off-screen and occluded sounding objects. I will present our recent audio-visual dataset Urbansas which addresses some of the current gaps in urban audio-visual research, and I will discuss our audio benchmark for vehicle sound event detection and localization, which uses video as the indication of the vehicles positions. Finally, I will present RCGrad, our zero-shot model for the localization of sounding objects in images, and I will show some examples on what these models actually learn in different audio-visual domains. I will conclude with some of my thoughts on open problems in visual sound localization and urban scene analysis moving forward.

Relevant papers:

  • Urban Sound & Sight: Dataset And Benchmark For Audio-Visual Urban Scene Understanding, IEEEXplore
  • How to Listen? Rethinking Visual Sound Localization, arxiv

Bio
Magdalena Fuentes is a Postdoctoral Faculty Fellow at New York University (NYU), with appointments in the Center for Urban Science and Progress (CUSP) at Tandon School of Engineering and the Music and Audio Research Laboratory (MARL) at the Steinhardt School of Culture, Education, and Human Development. She works in machine listening, her research interests include representation learning and self-supervised learning for audio-visual data, music information retrieval, environmental sound analysis, sound source localization and human-centered machine listening. Before being at NYU, she did her Ph.D. at Université Paris Saclay in France, and her B.Eng. in Electrical Engineering at Universidad de la República in Uruguay. Magdalena is part of the IEEE Audio and Acoustic Signal Processing Technical Committee, and she has been involved in the organization of ISMIR, and has served as Area Chair for ICASSP and Program Chair for DCASE, among others.

*Follow Magdalena
https://twitter.com/mfu3ntes

*Host
Kobalt Music: https://www.kobaltmusic.com/

*Sponsors
IEEE Signal Processing Society. http://www.signalprocessingsociety.org/

Photo of London Audio and Music AI Meetup group
London Audio and Music AI Meetup
See more events