Skip to content

Details

Join the Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Register to reserve your spot

Date, Time and Location

Apr 22, 2026
5:30 - 8:30 PM

Impact Hub Munich
Gotzinger Str. 8
München, Germany

Learning Disentangled Motion Representations for Open-World Motion Transfer

Recent progress in image- and text-to-video generation has made it possible to synthesize visually compelling videos, yet these models typically lack an explicit, reusable notion of motion. In this talk, I will present recent work on learning high-level, content-independent motion representations directly from open-world video data, with a focus on our NeurIPS spotlight paper introducing DisMo.

By disentangling motion semantics from appearance and object identity, such representations enable open-world motion transfer across semantically unrelated entities and provide a flexible interface for adapting and fine-tuning modern video generation models. Beyond generation, I will discuss how abstract motion representations support downstream motion understanding tasks and why they offer a promising direction for more controllable, general, and future-proof video models. The talk will conclude with a broader perspective on the opportunities and challenges of motion-centric representations in computer vision and video learning.

About the Speaker

Thomas Ressler-Antal is a PhD student at the Computer Vision & Learning Lab at LMU Munich, advised by Björn Ommer. My research focuses on representation learning from large-scale, open-world video data, with an emphasis on disentangling motion from appearance. I am particularly interested in motion understanding, video generation, and transferable representations that enable controllable and general-purpose video models. My work has been published at NeurIPS as a spotlight paper on learning abstract motion representations from raw video.

Towards Generating Fully Navigable 3D Scenes

3D world generation is a longstanding goal of computer vision with applications in VR/gaming/movies, robotics, and digital twins. Recent progress in generative models, in particular image and video diffusion models, enables automatic generation of photorealistic 3D environments. This talk describes a simple yet effective framework to exploit these models for 3D scene genration. Namely, we'll briefly talk about early approaches (Text2Room, ViewDiff) and dive deep into our recent state-of-the-art approach WorldExplorer.

About the Speaker

Lukas Höllein is a a PhD student at the Visual Computing & Artificial Intelligence Lab at the Technical University of Munich, supervised by Prof. Dr. Matthias Nießner. My research lies at the intersection of computer vision/graphics and machine learning, concerning mostly 3D reconstruction and generation. I'm especially interested in the creation of fully navigable 3D worlds with the help of generative AI.

Finding Motion in Commotion: Estimating and Anticipating Motion in Everyday Visual Scenes

Motion is an intrinsic property of video data. How do we harness motion from the abundance of videos to advance vision foundation models? This talk will examine key challenges and emerging opportunities in motion estimation and motion-aware representation learning at scale. Drawing on our latest results from NeurIPS and ICCV, the talk will show how motion-centric learning can enable more versatile and generalisable vision foundation models.

About the Speaker

Nikita Araslanov is a postdoctoral researcher in the Computer Vision Group at TU Munich. His research focuses on semantic and 3D visual inference from video data, with the goal of bridging visual perception and reasoning about complex phenomena. He earned his PhD in Computer Science from TU Darmstadt (2022) and was a visiting researcher at Google (2024–2025).

Small Models, Big Intelligence: How vLLM Semantic Router Uses Sub-2B Language Models for Production-Scale Routing

The vLLM Semantic Router introduces a groundbreaking approach to intelligent LLM request routing through its MoM (Mixture of Models) family, a collection of specialized small language models that make split-second routing decisions for production systems. This system operates between users and models, capturing signals from requests, responses, and context to make intelligent routing decisions, including model selection, safety filtering (jailbreak, PII), semantic caching, and hallucination detection.

In this talk, we'll explore how the router leverages tiny but powerful models like ModernBERT (encoder-based) and Qwen3 (0.6B-1.7B parameter decoder models) to achieve sub-10ms latency classification at over 10,000 queries per second. We'll dive into the technical architecture showing how these small models handle domain classification, jailbreak detection, PII protection, and hallucination detection, proving that for routing intelligence, size isn't everything.

About the Speaker

Peter Bouda is an AI Engineer and tech leader with over 20 years of experience building cutting-edge solutions across AI/ML, NLP, and full-stack development. Currently architecting AI platforms at EY, he previously led the AI Lab at Apiax where he built sophisticated NLP stacks for regulatory compliance in the financial services industry, successfully raised over €1.5 million in EU R&D funding, and deployed production-grade microservices on Azure Kubernetes. The rest is biking: He regulary embarks on long-distance cycling trips and discovers new roads and trails in Portugal and Spain.

Data Foundations for Vision-Language-Action Models

Model architectures get the papers, but data decides whether robots actually work. This talk introduces VLAs from a data-centric perspective: what makes robot datasets fundamentally different from image classification or video understanding, how the field is organizing its data (Open X-Embodiment, LeRobot, RLDS), and what evaluation benchmarks actually measure. We'll examine the unique challenges such as temporal structure, proprioceptive signals, and heterogeneity in embodiment, and discuss why addressing them matters more than the next architectural innovation.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.

Related topics

Events in München, DE
Artificial Intelligence
Computer Vision
Machine Learning
Data Science
Open Source

You may also like