March 26 - Advances in AI at Northeastern University

Name: March 26 - Advances in AI at Northeastern University
Start: 2026-03-26T18:00:00+02:00
End: 2026-03-26T20:00:00+02:00

Network event

161 attendees from 48 groups hosting

Hosted by Jimmy G.

Computer Vision Israel Meetup

Details

Join us to hear about the latest advances in AI at Northeastern University!

Date, Time and Location

March 26, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!

Scalable and Efficient Deep Learning: From Understanding to Generation

In an era where model complexity and deployment constraints increasingly collide, achieving both scalability and efficiency in deep learning has become essential. Scalable and efficient deep learning ensures that powerful models can be trained, deployed, and adapted under limited computational and data resources, enabling broader accessibility and practical application. From understanding to generation, this talk unifies methods that cut costs while preserving capability.

About the Speaker

Yitian Zhang is a fifth-year PhD student at Northeastern University, advised by Prof. Yun Raymond Fu. His research interests center around Efficient and Scalable AI, spanning Generative Models, Multimodal Large Language Models, and Foundation Models.

Grounding Visual AI Models in Real-World Physics

Generative video models have made rapid progress in visual realism, yet they frequently violate basic physical laws, producing implausible motion and incorrect cause-effect relationships. This talk presents MoReGen, a physics-grounded, agentic text-to-video generation framework that integrates Newtonian physics directly into the generation process via executable physics-engine code.

By coupling vision–language models with trajectory-based physical evaluation and iterative feedback, MoReGen produces videos that are both visually coherent and physically consistent. We further introduce MoRe Metrics and MoReSet, a benchmark and dataset designed to evaluate physics fidelity beyond appearance-based metrics such as FID and FVD. Together, this work demonstrates a path toward visual AI systems that reason about motion, interaction, and causality in the real world rather than hallucinating them.

About the Speakers

Professor Sarah Ostadabbas is an Associate Professor of Electrical and Computer Engineering at Northeastern University, where she directs the Augmented Cognition Lab (ACLab) and serves as Director of Women in Engineering. Her research focuses on computer vision and machine learning, with an emphasis on motion-centric representation learning, small-data AI, and applications in healthcare, defense, and behavior understanding under privacy and data constraints. She has authored over 130 peer-reviewed publications and received numerous honors, including the NSF CAREER Award, Sony Faculty Innovation Award, and the Cade Prize for Inventivity, along with multiple industry and federal research awards.

Xiangyu Bai is a third-year PhD student in the ACLab and leads the lab's work on physics-aware visual intelligence, with several publications in top-tier computer vision and robotics conferences.

WorldFormer: Diffusion Transformer World Models with Mixture-of-Experts for Embodied Physical Intelligence

World models have emerged as a foundational paradigm for enabling agents to simulate, predict, and reason about complex environments. Recent advances driven by diffusion transformer (DiT) architectures have dramatically expanded the fidelity, scalability, and physical plausibility of learned world models. In this work, we present a world model framework built upon the diffusion transformer paradigm, following the design philosophy of state-of-the-art systems such as NVIDIA Cosmos. Our approach comprises three core components: (1) a spatiotemporal variational autoencoder (VAE) that compresses high-resolution video into a compact continuous latent space with strong temporal causality, enabling efficient encoding and decoding of long-horizon video sequences; (2) a transformer-based diffusion backbone that operates on 3D-patchified latent tokens, leveraging self-attention and cross-attention with text embeddings to iteratively denoise Gaussian noise into physically coherent future video states using a flow matching objective; and (3) a scalable pre-training and post-training pipeline that first trains a generalist world foundation model on large-scale, diverse video data and then specializes it to target physical AI domains — such as robotic manipulation, autonomous driving, or embodied navigation — through task-specific fine-tuning.

Our model supports both text-to-world and video-to-world generation, enabling action-conditioned future state prediction for downstream planning and policy learning. We discuss implications for synthetic data generation, sim-to-real transfer, and the integration of world models into vision-language-action (VLA) pipelines for physical AI.

About the Speaker

Yanzhi Wang joined the Electrical & Computer Engineering department in August 2018 as an Assistant Professor. He earned his PhD at University of Southern California. His research interests include energy-efficient and high-performance implementations of deep learning and artificial intelligence systems; neuromorphic computing and non-von Neumann computing paradigms; cyber-security in deep learning systems; emerging deep learning algorithms/systems such as Bayesian neural networks, generative adversarial networks (GANs) and deep reinforcement learning.

Physical AI Research (PAIR) Center: Foundational Pairing of Digital Intelligence & Physical World Deployment at Northeastern University and Beyond

The Physical AI Research (PAIR) initiative advances the next frontier of artificial intelligence: enabling systems that can perceive, reason, and act reliably in the physical world. By uniting expertise across engineering, computer science, health sciences, and the social sciences, PAIR develops safe, transparent, and human-aligned AI that bridges digital models with real-world dynamics. The initiative is organized around three intellectual pillars: Learning and Modeling the World, through physics-informed multimodal learning, realistic simulations, and digital twins; Reasoning in the World, by integrating multimodal evidence to support grounded decision-making under uncertainty; and Acting in the World, by ensuring AI systems are verifiable, explainable, energy-efficient, and trustworthy. Together, these efforts position Physical AI as a foundational science driving innovation in health, sustainability, and security.

About the Speaker

Edmund Yeh is the Department Chair of Electrical and Computer Engineering at Northeastern University.

Computer Vision Israel Meetup

Versatile

Cloudinary

Healthy.io

LEO pharma

March 26 - Advances in AI at Northeastern University

Computer Vision Israel Meetup

Details

Related topics

Sponsors

Versatile

Cloudinary

Healthy.io

LEO pharma

You may also like