
About us
BayNode is a community focused node.js meetup in Mountain View We meet for a talk night (food & drinks), and a Beer.node (unformatted socializing).
Each Node Night features 2-3 talks relevant to the node.js ecosystem. When possible, we prioritize speakers and topics from our members over specific topics or expertise level.
If you want to help, we are always looking for contributors.
Upcoming events
10
- Network event

May 11 - Best of 3DV 2026
·OnlineOnline144 attendees from 48 groupsWelcome to the Best of 3DV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Date, Time and Location
May 11, 2026
9AM Pacific
Online. Register for Zoom!Navigating a 3D Vision Conference with VLMs and Embeddings
Attending the 3D Vision Conference means confronting 177 accepted papers across 3.5 days, far more than any one person can absorb. Skimming titles the night before isn't enough.
This talk shows how to build a systematic, interactive map of an entire conference using modern open-source tools. We load all 177 papers from 3DV 2026 (full PDF page images plus metadata) into a FiftyOne grouped dataset. We then run three annotation passes using Qwen3.5-9B on each cover page: topic classification, author affiliation extraction, and project page detection. Document embeddings from Jina v4 are computed across all 3,019 page images, pooled to paper-level vectors, and fed into FiftyOne Brain for UMAP visualization, similarity search, representativeness scoring, and uniqueness scoring.
The result is an interactive dataset you can query, filter, and explore in the FiftyOne App. Sort by uniqueness to find distinctive work, filter by topic and sort by representativeness to understand each research area, and cross-reference with scheduling metadata to build a personal agenda.
I demonstrate the end-to-end pipeline and discuss design decisions regarding grouped datasets, reasoning model output parsing, and embedding pooling strategies.
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.
Seeing Through Clutter: Structured 3D Scene Reconstruction via Iterative Object Removal
We present SeeingThroughClutter, a method for reconstructing structured 3D representations from single images by segmenting and modeling objects individually. Prior approaches rely on intermediate tasks such as semantic segmentation and depth estimation, which often underperform in complex scenes, particularly in the presence of occlusion and clutter.
We address this by introducing an iterative object removal and reconstruction pipeline that decomposes complex scenes into a sequence of simpler subtasks. Using VLMs as orchestrators, foreground objects are removed one at a time via detection, segmentation, object removal, and 3D fitting. We show that removing objects allows for cleaner segmentations of subsequent objects, even in highly occluded scenes. Our method requires no task-specific training and benefits directly from ongoing advances in foundation models. We demonstrate state-of-the-art robustness on 3D-Front and ADE20K datasets.
About the Speaker
Rio Aguina-Kang is currently a Machine Learning Engineer at Drafted AI, a startup focused on generative architecture. He has previously worked at Adobe Research, Brown Visual Computing, and the Stanford Institute for Human-Centered Artificial Intelligence. He is broadly interested in building systems that let users generate and control visual content through structured representations that reflect their intent.
Physical Realistic 4D Generation
Generating dynamic 3D content that moves and deforms over time is a key frontier in visual computing, with applications in VR/AR, robotics, and digital humans. In this talk, I present our series of works on physically realistic 4D generation: from neural surface deformation with explicit velocity fields (ICLR 2025) to our 4Deform framework for robust shape interpolation (CVPR 2025). Both methods use implicit neural representations with physically constrained velocity fields that enforce volume preservation, spatial smoothness, and geometric consistency. I will also introduce TwoSquared (3DV 2026, oral), which achieves full 4D generation from just two 2D image pairs — demonstrating a practical path toward controllable, physically plausible 4D content creation.
About the Speaker
Lu Sang is a PhD researcher in Computer Vision at TU Munich (Prof. Daniel Cremers), specializing in 3D/4D reconstruction, neural implicit surfaces, and inverse rendering, with serveral publications at top venues including CVPR, ICLR, and ECCV. She is currently a research intern at Google XR in Zurich. With a strong mathematical foundation and a track record spanning photometric stereo to 4D generation, she brings both theoretical depth and hands-on engineering to cutting-edge visual computing research.
Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception
How can we use and perceive objects without training a new model given only a few images? We present NeMO, a novel object representation that allows 6DoF object pose estimation, detection and segmentation given only a hand full of RGB images of an unknown object.
About the Speaker
Sebastian Jung studied physics at the LMU munich. He started his PhD in Computer Science at the german aerospace center (DLR) in 2025 and focuses on object centric few shot perception with a focus on robotic applications. Additionally, he's a student researcher at google, focusing on computer vision algorithms for XR.
1 attendee from this group - Network event

May 12 - Best of 3DV 2026
·OnlineOnline111 attendees from 48 groupsWelcome to the Best of 3DV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Date, Time and Location
May 12, 2026
9AM Pacific
Online. Register for Zoom!Precise lighting control in diffusion models by drawing shadows
Diffusion models can now be used as powerful neural rendering engines that can be leveraged for realistically inserting virtual objects into images. However, unlike traditional 3D rendering engines (e.g., Blender), they lack precise control over the lighting, an essential requirement in an artistic workflow. We demonstrate that fine-grained lighting control can be achieved for object relighting simply by specifying the desired shadow of the object and injecting it into the diffusion denoising process. The model then produces a realistic relighting of the object consistent with the input shadow direction.
About the Speaker
Frédéric Fortier-Chouinard is a PhD student at Laval University, advised by Prof. Jean-François Lalonde. His research focuses on adding precise physical controls to diffusion-based image and video generation methods, in particular lighting and camera control.
SmokeSeer: 3D Gaussian Splatting for Smoke Removal and Scene Reconstruction
Smoke in real-world scenes can severely degrade image quality and hamper visibility. Recent image restoration methods either rely on data-driven priors that are susceptible to hallucinations, or are limited to static low-density smoke. We introduce SmokeSeer, a method for simultaneous 3D scene reconstruction and smoke removal from multi-view video sequences. Our method uses thermal and RGB images, leveraging the reduced scattering in thermal images to see through smoke. We build upon 3D Gaussian splatting to fuse information from the two image modalities, and decompose the scene into smoke and non-smoke components. Unlike prior work, SmokeSeer handles a broad range of smoke densities and adapts to temporally varying smoke. We validate our method on synthetic data and a new real-world smoke dataset with RGB and thermal images.
About the Speaker
Neham Jain is a Research Scientist at Meshy AI focused on 3D generative models and multimodal learning. He holds an M.S. in Robotics from Carnegie Mellon University and works at the intersection of 3D vision, neural rendering, and scalable AI systems.
Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption
Depth estimation from monocular video has become a key component of many real-world computer vision systems. Recently, Video Depth Anything (VDA) has demonstrated strong performance on long video sequences. However, it relies on batch-processing which prohibits its use in an online setting. In this work, we overcome this limitation and introduce online VDA (oVDA). The key innovation is to employ techniques from Large Language Models (LLMs), namely, caching latent features during inference and masking frames at training. Our oVDA method outperforms all competing online video depth estimation methods in both accuracy and VRAM usage. Low VRAM usage is particularly important for deployment on edge devices. We demonstrate that oVDA runs at 42 FPS on an NVIDIA A100 and at 20 FPS on an NVIDIA Jetson edge device.
About the Speaker
Johann-Friedrich Feiden is a PhD student at Universität Heidelberg specializing in computer vision and machine learning. During my bachelor's, I focused on self-supervised representations, while during my master's, I shifted my focus towards computer vision.
Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos
Novel view synthesis of dynamic scenes from monocular video tends to break down once the camera deviates far from the training trajectory, leaving applications in mixed reality, autonomous driving, and immersive media without reliable wide-angle renderings. We present ExpanDyNeRF, a dynamic NeRF that broadens the reliable synthesis range to large-angle rotations by leveraging Gaussian splatting priors as pseudo ground truth to jointly refine density and color at novel viewpoints. To benchmark side-view fidelity, an axis largely missing from prior datasets, we introduce SynDM, the first synthetic dynamic multi-view dataset with paired primary and rotated views, built on a custom GTA V pipeline. Across SynDM, DyNeRF, and NVIDIA, ExpanDyNeRF substantially outperforms prior dynamic NeRF and Gaussian methods under extreme viewpoint shifts.
We close by previewing PanoWorld, our follow-up that pushes view expansion to its natural limit, namely geometry-consistent 360° panoramic video generation from a single image and text prompt.
About the Speaker
Le Jiang is a [Ph.D. student] in the Augmented Cognition Lab (ACLab) at Northeastern University, advised by Prof. Sarah Ostadabbas. His research centers on 3D scene reconstruction and novel view synthesis for dynamic scenes, with recent work extending dynamic NeRFs to large-angle viewpoints and pushing view synthesis toward geometry-consistent 360° panoramic video world models.
1 attendee from this group - Network event

May 13 - Best of 3DV 2026
·OnlineOnline144 attendees from 48 groupsWelcome to the Best of 3DV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Date, Time and Location
May 13, 2026
9AM Pacific
Online. Register for Zoom!Material selection in 2D and beyond - methods, tricks and applications
In this talk, we'll explore image understanding from a material-centric perspective, namely through the lens of material understanding. Materials distinguish themselves by their response to light, which is governed and modelled through physical properties like roughness or gloss - however, understanding such properties is a non-trivial task for current models and network architectures. We'll see how we can select materials similar to a given query material, significantly improve selection fidelity and eventually even venture beyond 2D, to enable selection in the 3D domain.
About the Speaker
Michael Fischer is a research scientist at Adobe research London. He obtained his PhD from University College London (UCL), advised by Niloy Mitra and Tobias Ritschel. Michael has authored several top-tier publications (CVPR, ICCV, SIGGRAPH, ...) and is a recipient of both the Meta PhD scholarship and the Rabin Ezra scholarship. His research interests focus on image- and scene-understanding, material perception, selection and editing and efficient optimization.
Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
This paper presents LAPA (Look Around and Pay Attention), a novel end-to-end transformer-based architecture for multi-camera point tracking that integrates appearance-based matching with geometric constraints. Traditional pipelines decouple detection, association, and tracking, leading to error propagation and temporal inconsistency in challenging scenarios. LAPA addresses these limitations by leveraging attention mechanisms to jointly reason across views and time, establishing soft correspondences through a cross-view attention mechanism enhanced with geometric priors. Instead of relying on classical triangulation, we construct 3D point representations via attention-weighted aggregation, inherently accommodating uncertainty and partial observations.
About the Speaker
Bishoy Galoaa is a PhD Student in Northeastern University
Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On
We introduce Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos. Existing methods for 3D neural avatars typically treat the human body and clothing as an inseparable entity, which fails to capture the dynamics of complex free-form garments and limits the reuse of clothing across different subjects. To overcome these problems, our method decomposes neural avatars into bodies and layers of shape-agnostic neural garments. Our framework learns the geometry and deformations of each garment layer from multi-view videos and normalizes them into a shape-independent space using 3D Gaussians. We demonstrate that these compositional garments contribute to a versatile digital wardrobe, enabling a practical 3D virtual try-on application where clothing can be freely transferred to new subjects.
About the Speaker
Hsuan-I Ho obtained his doctoral degree from ETH Zurich, supervised by Prof. Otmar Hilliges and Prof. Marc Pollefeys. His research focuses on human-centric machine perception, including 3D human reconstruction, human modeling, and pose estimation. The goal is to push the boundary of human and machine interaction on future human-centric reasoning and physical AI.
Consistency Models for 3D Point Cloud Generation
ConTiCoM-3D is a new method for creating 3D point clouds. It works directly with 3D points and can generate shapes very quickly in only one or two steps. Unlike many older methods, it does not need a separate teacher model or a complex latent space. Tests on typical benchmarks show that it can produce high-quality 3D shapes while being faster than many existing approaches.
About the Speaker
Sebastian Eilermann is a PhD student specialising in 3D generative AI. My research focuses on developing advanced methods for creating and understanding three-dimensional content. I explore the intersection of machine learning, computer vision and generative modelling to enable the generation of more realistic and efficient 3D assets.
1 attendee from this group - Network event

May 14 - AI, ML and Computer Vision Meetup
·OnlineOnline322 attendees from 48 groupsJoin our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Day, Time, Location
May 14, 2026
9:00-11:00 AM Pacific
Online. Register for the Zoom!Concept-Aware Batch Sampling Improves Language-Image Pretraining
What data should a vision-language model be trained on, and who gets to decide what “good data” even means? Most existing curation pipelines are limited because they are offline (they produce a static dataset from a set of predetermined filtering criteria) and concept-agnostic (they rely on model-based scores that can silently introduce new biases in what concepts the model sees). In this talk, I will discuss our new work CABS that tackles both these problems with large-scale sample-level concept annotations and flexible online batch sampling.
First, we construct DataConcept, a 128M web-crawled image–text collection annotated with fine-grained concept composition, and show how this enables Concept-Aware Batch Sampling (CABS)—a simple online method that constructs training batches on-the-fly to match target concept distributions. We develop two variants, CABS-DM for maximizing concept coverage and CABS-FM for prioritizing high object multiplicity, and demonstrate consistent gains for CLIP/SigLIP-style models across 28 benchmarks.
Finally, I’ll show that these improvements translate into strong vision encoders for training generative multimodal models, including autoregressive systems like LLaVA, where the encoder quality materially affects downstream capability.
About the Speaker
Adhiraj Ghosh is a first year ELLIS PhD student, working with Matthias Bethge at The University of Tübingen. He completed his undergraduate degree in Electrical and Electronics Engineering jointly at the Manipal Institute of Technology and SMU Singapore from 2016 to 2020, and his masters in Machine Learning at The University of Tübingen from 2022 to 2024.
Do Your Agents Actually Work? Measuring Skills and MCP in Practice
This talk shows how to evaluate agent performance in real scenarios using FiftyOne Skills and MCP. We will cover practical ways to design scenarios, run agents, and measure how they use tools, including signals like latency, token usage, and output quality. The goal is to move beyond final outputs and better understand agent behavior, helping teams build more reliable and measurable agent systems.
About the Speaker
Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.
The last mile of OCR [in 2026]
OCR is nailing it in benchmarks but the real work lies in The Long Tail of IDP. Large tables, old scans, mixed-language docs, handwriting, complex layouts is where most enterprise and real-world document work happens. This is where the best benchmarked models still struggle. In this talk, we will go through how LandingAI’s Agentic Document Extraction (ADE) goes beyond OCR and Parsing to enable real-world document AI use cases and workloads.
We'll cover:
- The pillars of Agentic Document Extraction
- Building document processing pipelines with ADE API/SDK
- Using Skills to have Coding Agents build for you
- How ADE gives LLMs the last mile - Analysing LLM performance on large table, scanned docs, complex layouts and enabling them with the structured output from ADE
About the Speaker
Ankit Khare has been building Developer Relations function at high-growth startups like Rockset (a world-class retrieval system, later acquired by OpenAI), Twelve Labs (a video intelligence startup backed by Index Ventures, Radical Ventures, and NEA), and Abacus.AI (an AI Super Assistant backed by Index Ventures, Eric Schmidt, and Ram Shriram). Before that, he was an AI engineer at third insight and an AI researcher at the LEARN Lab at UT-Arlington, working on visual scene understanding and image captioning agents.
The Energy Layer of AI: Powering the Next Wave of Inference
The talk explores how inference cost fundamentally ties to energy at scale, especially as the AI industry shifts toward always-on, agent-driven workloads, and the focus moved from training to inference economics. Medi will share lessons and observations from his team's R&D efforts in making AI workloads grid-aware, energy-intelligent, and dynamically optimized in real time.
About the Speaker
Medi Naseri is the Founder and CEO of LōD Technologies, where he leads the development of energy-intelligent infrastructure for flexible data centers and the broader compute ecosystem.
With a PhD in Electrical Engineering specializing in control and power systems, Medi brings deep technical expertise to the challenge of scaling AI within real-time grid constraints.1 attendee from this group
Past events
193


