Skip to content

About us

BayNode is a community focused node.js meetup in Mountain View  We meet  for a talk night (food & drinks), and a Beer.node (unformatted socializing). 

Each Node Night features 2-3 talks relevant to the node.js ecosystem. When possible, we prioritize speakers and topics from our members over specific topics or expertise level.

If you want to help, we are always looking for contributors. 

Sponsors

StrongLoop

StrongLoop

An IBM company helps build node.js & APIs made for cloud.

Upcoming events

9

See all
  • Network event
    Feb 11 - Visual AI for Video Use Cases

    Feb 11 - Visual AI for Video Use Cases

    ·
    Online
    Online
    237 attendees from 47 groups

    Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases.

    Time and Location

    Feb 11, 2026
    9 - 11 AM Pacific
    Online.
    Register for the Zoom!

    VIDEOP2R: Video Understanding from Perception to Reasoning

    Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning.

    In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning.

    About the Speaker

    Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models.

    Layer-Aware Video Composition via Split-then-Merge

    Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations.

    About the Speaker

    Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety.

    Video-native VLMs and control

    We show how image-native vision–language models can be extended to support native video understanding, structured reasoning, tool use, and robotics. Our approach focuses on designing data, modeling, and training recipes to optimize for multimodality input and interaction patterns - treating vision and perception as a first class citizens. We discuss lessons learned from scaling these methods in an open-source model family and their implications for building flexible multimodal systems.

    About the Speaker

    Akshat Shrivastava is the CTO and co-founder of Perceptron, previously leading AR On-Device at Meta and conducting research at UW.

    Video Intelligence Is Going Agentic

    Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system.

    About the Speaker

    James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem.

    • Photo of the user
    • Photo of the user
    • Photo of the user
    3 attendees from this group
  • Network event
    Feb 18 - Feedback-Driven Annotation Pipelines for End-to-End ML Workflows

    Feb 18 - Feedback-Driven Annotation Pipelines for End-to-End ML Workflows

    ·
    Online
    Online
    127 attendees from 47 groups

    In this technical workshop, we’ll show how to build a feedback-driven annotation pipeline for perception models using FiftyOne. We’ll explore real model failures and data gaps, and turn them into focused annotation tasks that then route through a repeatable workflow for labeling and QA. The result is an end-to-end pipeline keeping annotators, tools, and models aligned and closing the loop from annotation, curation, back to model training and evaluation.

    Time and Location

    Feb 18, 2026
    10 - 11 AM PST
    Online. Register for the Zoom!

    What you'll learn

    • Techniques for labeling the data that matters the most for annotation time and cost savings
    • Structure human-in-the-loop workflows for finding and fixing model errors, data gaps, and targeted relabeling instead of bulk labeling
    • Combine auto-labeling and human review in a single, feedback-driven pipeline for perception models
    • Use label schemas and metadata as “data contracts” to enforce consistency between annotators, models, and tools, especially for multimodal data
    • Detect and manage schema drift and tie schema versions to dataset and model versions for reproducibility
    • QA and review steps that surface label issues early and tie changes back to model behavior
    • An annotation architecture that can accommodate new perception tasks and feedback signals without rebuilding your entire data stack
  • Network event
    March 5 - AI, ML and Computer Vision Meetup

    March 5 - AI, ML and Computer Vision Meetup

    ·
    Online
    Online
    124 attendees from 47 groups

    Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

    Date and Location

    Mar 5, 2026
    9 - 11 AM Pacific
    Online.
    Register for the Zoom!

    MOSPA: Human Motion Generation Driven by Spatial Audio

    Enabling virtual humans to dynamically and realistically respond to diverse auditory stimuli remains a key challenge in character animation, demanding the integration of perceptual modeling and motion synthesis. Despite its significance, this task remains largely unexplored. Most previous works have primarily focused on mapping modalities like speech, audio, and music to generate human motion. As of yet, these models typically overlook the impact of spatial features encoded in spatial audio signals on human motion.

    To bridge this gap and enable high-quality modeling of human movements in response to spatial audio, we introduce the first comprehensive Spatial Audio-Driven Human Motion (SAM) dataset, which contains diverse and high-quality spatial audio and motion data. For benchmarking, we develop a simple yet effective diffusion-based generative framework for human MOtion generation driven by SPatial Audio, termed MOSPA, which faithfully captures the relationship between body motion and spatial audio through an effective fusion mechanism. Once trained, MOSPA can generate diverse, realistic human motions conditioned on varying spatial audio inputs. We perform a thorough investigation of the proposed dataset and conduct extensive experiments for benchmarking, where our method achieves state-of-the-art performance on this task.

    About the Speaker

    Zhiyang (Frank) Dou is a Ph.D. student at MIT CSAIL, advised by Prof. Wojciech Matusik. I work with the Computational Design and Fabrication Group and the Computer Graphics Group.

    Securing the Autonomous Future: Navigating the Intersection of Agentic AI, Connected Devices, and Cyber Resilience

    With billions of devices now in our infrastructure and emerging as autonomous agents (AI), we face a very real question: How can we create intelligent systems that are both secure and trusted? This talk will explore the intersection of agentic AI and IoT and demonstrate how the same AI systems can provide robust defense mechanisms. At its core, however, this is a challenge about trusting people with technology, ensuring their safety, and providing accountability. Therefore, creating a new way of thinking is required, one in which security is built in, and where autonomous action has oversight; and, ultimately, innovation leads to greater human well-being.

    About the Speaker

    Samaresh Kumar Singh is an engineering principal at HP Inc. with more than 21 years of experience in designing and implementing large-scale distributed systems, cloud native platform systems, and edge AI / ML systems. His expertise includes agentic AI systems, GenAI / LLMs, Edge AI, federated and privacy preserving learning, and secure hybrid cloud / edge computing.

    Plugins as Products: Bringing Visual AI Research into Real-World Workflows with FiftyOne

    Visual AI research often introduces new datasets, models, and analysis methods, but integrating these advances into everyday workflows can be challenging. FiftyOne is a data-centric platform designed to help teams explore, evaluate, and improve visual AI, and its plugin ecosystem is how the platform scales beyond the core. In this talk, we explore the FiftyOne plugin ecosystem from both perspectives: how users apply plugins to accelerate data-centric workflows, and how researchers and engineers can package their work as plugins to make it easier to share, reproduce, and build upon. Through practical examples, we show how plugins turn research artifacts into reusable components that integrate naturally into real-world visual AI workflows.

    About the Speaker

    Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.

    Transforming Business with Agentic AI

    Agentic AI is reshaping business operations by employing autonomous systems that learn, adapt, and optimize processes independently of human input. This session examines the essential differences between traditional AI agents and Agentic AI, emphasizing their significance for project professionals overseeing digital transformation initiatives. Real-world examples from eCommerce, insurance, and healthcare illustrate how autonomous AI achieves measurable outcomes across industries. The session addresses practical orchestration patterns in which specialized AI agents collaborate to resolve complex business challenges and enhance operational efficiency. Attendees will receive a practical framework for identifying high-impact use cases, developing infrastructure, establishing governance, and scaling Agentic AI within their organizations.

    About the Speaker

    Joyjit Roy is a senior technology and program management leader with over 21 years of experience delivering enterprise digital transformation, cloud modernization, and applied AI programs across insurance, financial services, and global eCommerce.

  • Network event
    March 11 - Strategies for Validating World Models and Action-Conditioned Video

    March 11 - Strategies for Validating World Models and Action-Conditioned Video

    ·
    Online
    Online
    78 attendees from 47 groups

    Join us for a one hour hands-on workshop where we will explore emerging challenges in developing and validating world foundation models and video-generation AI systems for robotics and autonomous vehicles.

    Time and Location

    Mar 11, 2026
    10-11am PST
    Online,
    Register for the Zoom!

    Industries from robotics to autonomous vehicles are converging on world foundation models (WFMs) and action-conditioned video generation, where the challenge is predicting physics, causality, and intent. But this shift has created a massive new bottleneck: validation.

    How do you debug a model that imagines the future? How do you curate petabyte-scale video datasets to capture the "long tail" of rare events without drowning in storage costs? And how do you ensure temporal consistency when your training data lives in scattered data lakes?

    In this session, we explore technical workflows for the next generation of Visual AI. We will dissect the "Video Data Monster," demonstrating how to build feedback loops that bridge the gap between generative imagination and physical reality. Learn how leading teams are using federated data strategies and collaborative evaluation to turn video from a storage burden into a structured, queryable asset for embodied intelligence.

    About the Speaker

    Nick Lotz is chemical process engineer-turned-developer who is currently a Technical Marketing Engineer at Voxel51. He is particularly interested in bringing observability and security to all layers of the AI stack.

    • Photo of the user
    1 attendee from this group

Group links

Members

2,079
See all
Photo of the user Adam Crabtree
Photo of the user Henry Allen-Tilford
Photo of the user Mark Carranza
Photo of the user emily
Photo of the user Ivan Ramirez
Photo of the user Alvin Wang
Photo of the user Josh
Photo of the user Mauvis Ledford
Photo of the user Justin Lien
Photo of the user Bbbbb Aaaaaa
Photo of the user Larry Tu
Photo of the user Justin