
What we’re about
This group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
**********
このグループは、データサイエンティスト、機械学習エンジニア、そしてオープンソース愛好家のためのものです。
毎月、AI、機械学習、コンピュータービジョンの最先端で活躍する多様な講演者をお招きします。
Upcoming events (4+)
See all- Network event9 attendees from 5 groups hostingJuly 17 - Virtual AI, ML and Computer Vision MeetupLink visible for attendees
When and Where
July 17, 2025 | 10:00 – 11:30 AM Pacific
Using VLMs to Navigate the Sea of Data
At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.
About the Speaker
Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.
About the Speaker
Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.
Building Efficient and Reliable Workflows for Object Detection
Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.About the Speaker
Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.
Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets
High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.
About the Speaker
Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.
- Network event1 attendee from 5 groups hostingJuly 23 - Getting Started with FiftyOne for Healthcare Use CasesLink visible for attendees
When
Jul 23, 2025 at 9:00 - 10:30 AM PacificWhere
Online. Register for the Zoom!
About the Workshop
Visual AI is revolutionizing healthcare by enabling more accurate diagnoses, streamlining medical workflows, and uncovering valuable insights across various imaging modalities. Yet, building trustworthy AI in healthcare demands more than powerful models — it requires clean, curated data, strong visualizations, and human-in-the-loop understanding.
Join us for a free, 90-minute, hands-on workshop built for healthcare researchers, medical data scientists, and AI engineers working with real-world imaging data. Whether you're analyzing CT scans, radiology images, or multi-modal patient datasets, this session will equip you with the tools to design robust, transparent, and insight-driven computer vision pipelines — powered by FiftyOne, the open-source platform for Visual AI.By the end of the workshop, you'll be able to:
- Load and organize complex medical datasets (e.g., ARCADE, DeepLesion) with FiftyOne.
- Explore medical imaging data using embeddings, patches, and metadata filters.
- Curate balanced datasets and fine-tune models using Ultralytics YOLOv8 for tasks like stenosis detection.
- Analyze segment CT scans using MedSAM2.
- Analyze results from VLMs and foundation models like MedGEMMA, NVIDIA VISTA, and NVIDIA CRADIO.
- Evaluate model predictions and uncover failure cases using real-world clinical examples.
Why Attend?
This healthcare edition of our "Getting Started with FiftyOne" workshop connects foundational tools with real-world impact. Through curated datasets and clinical use cases, you'll see how to harness Visual AI responsibly, building data-centric pipelines that promote accuracy, interpretability, and trust in medical AI systems.
Prerequisites
Basic knowledge of Python and computer vision is recommended. No prior experience in healthcare is required — just curiosity and a commitment to building meaningful AI.
All participants will receive access to workshop notebooks, code examples, and extended resources to continue their journey in healthcare AI.
- Network event1 attendee from 5 groups hostingJuly 24 - Women in AILink visible for attendees
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision!
When
Jul 24, 2025 at 9 - 11 AM Pacific
Where
Online. Register for the Zoom
Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI
This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain.
About the Speaker
Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models.
Farming with CLIP: Foundation Models for Biodiversity and Agriculture
Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows.
We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming.
About the Speaker
Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.
Multi-modal AI in Medical Edge and Client Device Computing
In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such
as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications.About the Speaker
Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications.
The Business of AI
The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI.
About the Speaker
Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application.
- July 31 - In-Person: Tokyo AI, Machine Learning and Computer Vision MeetupGrand Hyatt Tokyo, Tokyo
7月31日、東京グランドハイアットで開催されるTokyo AI機械学習およびコンピュータビジョンMeetupに参加しませんか
時間と場所
対面
7月31日 2025年
17:00時〜20:00時
グランド ハイアット 東京
コアリンダールーム
東京都港区六本木 6 丁目 10 番 3 号自律走行の再発明: GenAIによる創造性と変革
本セッションでは、ジェネレーティブAIによって自律走行開発に革命をもたらすマイクロソフトの最先端アプローチを紹介します。
マイクロソフトのAvOpsプラットフォームは、シナリオ生成からデータ処理、機械学習による評価、次のアクションの抽出まで、開発ライフサイクルのあらゆるフェーズでAIを統合し、これまで数週間かかっていたワークフローを数時間で完了するタスクへと変貌させます。
ジェネレーティブAIが要件定義、スプリントプランニング、実装、テストを加速し、かつてない開発スピードを実現するハイパーベロシティエンジニアリングのコンセプトを掘り下げます。さらに、複数のジェネレーティブAIエージェントが連携してエンジニアリング・ワークフローを強化・効率化するフレームワークであるエージェントAIを紹介します。
実際のケーススタディを通じて、自律走行開発におけるジェネレーティブAIの具体的な影響を探り、モビリティの未来に対する変革の可能性について議論します。講演者について
Hideo Yoshimi はマイクロソフトのインダストリー・ソリューション・エンジニアリングチームのプリンシパル・テクニカル・プログラムマネージャーです。自律走行やソフトウェア定義ヴィークルなどの最先端領域における共同開発をリードし、業界をリードするパートナーと緊密に連携している。
日本の自動車OEMやTier1サプライヤーに先進的なグローバルAIの開発事例を提供し、新たな価値の提供をサポートするとともに、モビリティ分野における革新的なソリューションの創出に貢献しています。ニューラル再構築とワールド・ファウンデーション・モデルによる自律走行車開発の推進
最先端のニューラル再構築とワールド・ファウンデーション・モデルが、シミュレーションワークフローを合理化し、AVモデルの開発、テスト、検証における重要な課題に対処することで、自律走行車開発にどのような変革をもたらすかをご覧ください。本セッションでは、NVIDIA NuRecおよびCosmosの画期的な技術と、次世代AVシステムのためのスケーラブルなシミュレーションパイプラインを可能にするVoxel51の統合を紹介します。
講演者について
Mate Szarvas は現在、NVIDIAのJapan DRIVE Solutionsソフトウェアチームを率いており、日本の自動車産業がAIファーストの開発時代に移行するのをサポートすることに注力しています。過去には、リアルタイムOS企業でのシステムソフトウェアエンジニアリング、Tier-1サプライヤーでのADAS業界への畳み込みニューラルネットワークベースコンピュータビジョンの導入、アカデミアでの自然言語モデリング研究などの実績があります。
GenAI評価の世界
生成AIモデルは急速に進化しており、現在では幅広いタスクをサポートしています。その能力を理解するために、大規模な評価活動が行われています。Weights & Biases Japanでは、日本最大級の公開LLMリーダーボードとビジョン言語モデルリーダーボードを運営しています。また最近では、生成AIシステム全体の評価に関するホワイトペーパーも発表しています。本講演では、これらの取り組みから得られた知見を共有し、モデルやシステムの評価における我々の幅広いコミュニティの取り組みについて紹介します。
講演者について
Keisuke KamataはWeights and BiasesにおいてマネージャーおよびAIソリューションエンジニアを担当しております。
現代のドライビングデータセット
自律走行車ほど、物理AIを急速に前進させているものはないと考えています。本講演では、合成データ、NeRF、よりスマートなキュレーション、ベクトル検索を使用して、最先端のAVデータセットを構築する方法を説明します。Nvidia Omniverse、最新モデル、そして最高のツールを駆使した本講演にご期待ください。
講演者について
Daniel GuralはVoxel51のベテラン機械学習エンジニアであり、データサイエンティストや機械学習エンジニアがデータの潜在能力を最大限に引き出せるように強い情熱を持っています。