Skip to content

Multimodal & Temporal Dynamics using Deep Graphical Models

Photo of Rahel Jhirad
Hosted By
Rahel J.
Multimodal & Temporal Dynamics using Deep Graphical Models

Details

I am pleased to host Mohamed Amer and Ajay Divakaran, both from SRI, an independent, nonprofit research center located in Princeton, NJ.

Abstract: Graphical models are an extremely versatile tool for machine learning. One set of models we focus on are called Restricted Boltzmann Machines (RBMs). RBMs are intrinsically modular and they can be used to compose more sophisticated models. Unfortunately, this modularity is often unexploited in software implementations, which makes writing a new model a lengthy and error-prone task. In this talk, we present our modular implementation of graphical models and applications in a variety of temporal, multimodal, and multitask domains. These applications include multimodal gesture recognition and body affect classification from motion capture and audio, face attribute classification from imagery and facial landmarks, and GPU event prediction from a history of instructions.

Bios: Mohamed Amer is a senior computer scientist at the Center for Vision Technologies, SRI International, Princeton. His research interests comprise deep generative models for computer vision, interpretable deep learning, deep temporal models and their applications, joint computer-human story telling etc. He has several publications to his credit and has organized a number of workshops at CVPR, ICCV, ACM MM etc..

He received his Ph.D. in 2014 from Oregon State University.

Ajay Divakaran is the technical director of the Vision and Learning Laboratory in SRI International’s Center for Vision Technologies. In this role, he is responsible for the proposal and execution of contract research projects in computer vision as well as multi-sensor systems that combine various modalities. His work includes social multimedia (video-audio-text) analytics, multimodal modeling and analysis of affective, cognitive, and physiological aspects of human behavior, interactive virtual reality-based training, applied machine learning, tracking of individuals in dense crowds and multi-camera tracking, and audio analysis for event detection in open-source video. He has developed several innovative technologies for multimodal systems in both commercial and government programs during the course of his career. Prior to joining SRI he worked at Mitsubishi Electric Research Labs for 10 years, where he was the lead inventor of the world’s first sports highlights playback-enabled DVR. He was named a Fellow of the IEEE in 2011 for his contributions to multimedia content analysis.

He received his M.S. and Ph.D. degrees in electrical engineering from Rensselaer Polytechnic Institute. His B.E. in electronics and communication engineering is from the University of Jodhpur in India

Photo of Economics and Big Data group
Economics and Big Data
See more events
NYU Courant Institute
251 Mercer St., Room 109 · New York, NY