This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events (4+)

See all

Network event
379 attendees from 37 groups hosting
Thu, Jul 24, 2025, 4:00 PM UTCJuly 24 - Women in AI
Link visible for attendees
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision!

When

Jul 24, 2025 at 9 - 11 AM Pacific

Where

Online. Register for the Zoom

Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI

This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain.

About the Speaker

Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models.

Farming with CLIP: Foundation Models for Biodiversity and Agriculture

Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows.

We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

Multi-modal AI in Medical Edge and Client Device Computing

In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such
as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications.

About the Speaker

Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications.

The Business of AI

The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI.

About the Speaker

Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application.
19 attendees from this group+14
Network event
284 attendees from 39 groups hosting
Thu, Aug 7, 2025, 4:00 PM UTCAugust 7 - Understanding Visual Agents
Link visible for attendees
Join us for a virtual event to hear talks from experts on the current state of visual agents.

When

Aug 7, 2025 at 9 AM Pacific

Where

Virtual. Register for the Zoom.

Foundational capabilities and models for generalist agents for computers

As we move toward a future where language agents can operate software, browse the web, and automate tasks across digital environments, a pressing challenge emerges: how do we build foundational models that can act as generalist agents for computers? In this talk, we explore the design of such agents—ones that combine vision, language, and action to understand complex interfaces and carry out user-intent accurately.

We present OmniACT as a case study, a benchmark that grounds this vision by pairing natural language prompts with UI screenshots and executable scripts for both desktop and web environments. Through OmniACT, we examine the performance of today’s top language and multimodal models, highlight the limitations in current agent behavior, and discuss research directions needed to close the gap toward truly capable, general-purpose digital agents.

About the Speaker

Raghav Kapoor is a machine learning at Adobe, where he works on the Brand Services team, contributing to cutting-edge projects in brand intelligence. His work blends research with machine learning, reflecting his deep expertise in both areas. Prior to joining Adobe, Raghav earned his Master’s degree from Carnegie Mellon University, where his research focused on multimodal machine learning and web-based agents. He also brings industry experience from his experience as a strategist at Goldman Sachs India.

BEARCUBS: Evaluating Web Agents' Real-World Information-Seeking Abilities

The talk focuses on the challenges of evaluating AI agents in dynamic web settings, the design and implementation of the BEARCUBS benchmark, and insights gained from human and agent performance comparisons. In the talk, we will discuss the significant performance gap between human users and current state-of-the-art agents, highlighting areas for future improvement in AI web navigation and information retrieval capabilities.

About the Speaker

Yixiao Song is a Ph.D. candidate in Computer Science at the University of Massachusetts Amherst. Her research focuses on enhancing the evaluation of natural language processing systems, particularly in assessing factuality and reliability in AI-generated content. Her work encompasses the development of tools and benchmarks such as VeriScore, an automatic metric for evaluating the factuality of long-form text generation, and BEARCUBS, a benchmark for assessing AI agents' ability to identify factual information from web content.

Visual Agents: What it takes to build an agent that can navigate GUIs like humans

We’ll examine conceptual frameworks, potential applications, and future directions of technologies that can “see” and “act” with increasing independence. The discussion will touch on both current limitations and promising horizons in this evolving field.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Implementing a Practical Vision-Based Android AI Agent

In this talk I will share with you practical details of designing and implementing Android AI agents, using deki.

From theory we will move to practice and the usage of these agents in
industry/production.

For end users - remote usage of Android phones or for automation of standard tasks. Such as:

"Write my friend 'some_name' in WhatsApp that I'll be 15 minutes late"

"Open Twitter in the browser and write a post about 'something'"

"Read my latest notifications and say if there are any important ones"

"Write a linkedin post about 'something'"

And for professionals - to enable agentic testing, a new type of test that only became possible because of the popularization of LLMs and AI agents that use them as a reasoning core.

About the Speaker

Rasul Osmanbayli is a senior Android developer at Kapital Bank, Baku/Azerbaijan. It is the largest private bank in Azerbaijan. He created deki, an Image Description model that was used as a foundation for an Android AI agent that achieved high results on 2 different benchmarks: Android World and Android Control.

He previously worked in Istanbul/Türkiye for various companies as an
Android and Backend developer. He is also a MS at Istanbul Aydin University in Istanbul/Türkiye.
10 attendees from this group+5
Network event
181 attendees from 44 groups hosting
Fri, Aug 15, 2025, 4:00 PM UTCAug 15 - Visual Agent Workshop Part 1: Navigating the GUI Agent Landscape
Link visible for attendees
Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.

Date and Time

Aug 15, 2025 at 9 AM Pacific

Register for the Zoom

Part 1: Navigating the GUI Agent Landscape

Understanding the Foundation Before Building

The GUI agent field is evolving rapidly, but success requires an understanding of what came before. In this opening session, we'll map the terrain of GUI agent research—from the early days of MiniWoB's simplified environments to today's complex, multimodal systems tackling real-world applications. You'll discover why standard vision models fail catastrophically on GUI tasks, explore the annotation bottlenecks that make GUI datasets so expensive to create, and understand the platform fragmentation that makes "click a button" mean twenty different things across datasets.

We'll dissect the most influential datasets (Mind2Web, AITW, Rico) and models that have shaped the field, examining their strengths, limitations, and the research gaps they reveal. By the end, you'll have a clear picture of where GUI agents excel, where they struggle, and, most importantly, where the opportunities lie for your own contributions.

About the Instructor

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
12 attendees from this group+7
Network event
46 attendees from 26 groups hosting
Thu, Aug 21, 2025, 4:00 PM UTCAug 21 - AI, ML and Computer Vision Meetup en Español
Link visible for attendees
Hear talks from experts on cutting-edge topics in AI, ML and Computer Vision Meetup en Español.

Date and Time

Aug 21 at 9 AM Pacific

Location

Virtual. Register for the Zoom

Quiero ser parte del mundo de AI, como lo logro?

En esta charla, compartiré mi trayectoria personal hacia el mundo de la inteligencia artificial (IA), comenzando con mi formación como ingeniero electrónico y mi doctorado en neuroinformática. Destacaré cómo mi tesis laureada sobre modelos volumétricos realistas para la localización precisa de fuentes EEG abrió puertas a oportunidades en procesamiento digital y visión 3D. Con experiencia docente en la Universidad Nacional de Colombia y certificaciones en machine learning y deep learning, discutiré cómo estos hitos me llevaron a desempeñarme como desarrollador de currículo para DeepLearning.AI, ofreciendo valiosas lecciones para quienes deseen seguir un camino similar.

Presentador

Ernesto Cuartas es un ingeniero electrónico y PhD en neuroinformática. Tesis PhD laureada “Forward volumetric modeling framework for realistic head models towards accurate EEG source localization”. Profesor asociado Universidad Nacional de Colombia. Experto en implementación y desarrollo de proyectos en procesamiento digital de señales, procesamiento digital de imágenes, visión 3D, computación gráfica, geometría computacional, fotogrametría e inteligencia artificial. Con certificaciones profesionales en machine learning, deep learning y data engineering. Actualmente trabajo como curriculum developer/engineer para DeepLearning.AI.

Domina tus Datos Médicos: De la Curación al Impacto Clínico

Los datos de alta calidad son la base de un aprendizaje automático efectivo en el ámbito de la salud. Esta charla presenta estrategias prácticas y técnicas emergentes para gestionar datasets de imágenes médicas, desde la generación de datos sintéticos y la curación, hasta la evaluación y el despliegue.

Comenzaremos con casos de estudio reales de investigadores y profesionales que están transformando sus flujos de trabajo en imágenes médicas mediante prácticas centradas en los datos. Luego pasaremos a un tutorial práctico utilizando FiftyOne, la plataforma open-source para la inspección visual de datasets y la evaluación de modelos. Los asistentes aprenderán a cargar, visualizar, curar y evaluar datasets médicos en distintos tipos de imágenes.

Ya seas investigador, clínico o ingeniero de ML, esta charla te brindará herramientas e ideas prácticas para mejorar la calidad de tus datos, la fiabilidad de tus modelos y su impacto clínico.

Presentadora

Paula Ramos tiene un doctorado en Visión Artificial y Aprendizaje Automático, con más de 20 años de experiencia en el campo tecnológico. Desde principios de la década del 2000 en Colombia, ha desarrollado novedosas tecnologías integradas de ingeniería, principalmente en Visión Artificial, robótica y Aprendizaje Automático aplicados a la agricultura.

Agentes AI Multi-Fuente y Embebidos

Demostraré cómo construir agentes de IA contextualmente conscientes, capaz de responder y tomar acciones entre multiples sistemas privados y la implementación de RAG semántico a través de fuentes de datos dispares, embebidos en sistemas existentes, todo esto sin necesidad de una infraestructura compleja de MLOps.

Presentador

Kevin Blanco es un Senior DevRel Advocate, Charlista Internacional con más de 15 años en liderazgo tecnológico. Ha diseñado estrategias de IA en IBM Watson y desarrollado soluciones para Google, Microsoft y Nintendo.

Más allá del modelo: Metodología y buenas prácticas para liderar proyectos exitosos de IA con CPMAI

El éxito de los proyectos de IA no depende solo del modelo o de los datos, sino de cómo se gestionan desde el inicio. En esta charla exploraremos la metodología CPMAI (Cognitive Project Management for AI) avalada por el Project Management Institute - PMI, un marco estructurado que permite a los equipos de IA alinear sus iniciativas con objetivos de negocio, gestionar riesgos éticos y mejorar los resultados. Compartiremos buenas prácticas que pueden ser adaptadas por profesionales técnicos para mejorar la entrega de valor en cada fase del proyecto e implementar soluciones de IA éticas y responsables.

Presentadora

Ivonne Mejía B. es especialista en gestión de proyectos tecnológicos, con más de 20 años de experiencia internacional en el sector privado y académico en México, Canadá y Estados Unidos. Está certificada en CPMAI™, PMP®, Prosci®, y cuenta con un diplomado en Liderazgo Tecnológico por UC Berkeley. Disfruta colaborar, aprender en comunidad y compartir su experiencia para ayudar a las organizaciones a definir estrategias de transformación con IA y liderar soluciones éticas y responsables.
4 attendees from this group