What we're about
This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month we'll bring you two diverse speakers working at the cutting edge of computer vision.
What's computer vision? It's how systems can derive meaningful information from digital images, videos and other visual inputs — and how they can take actions or make recommendations based on that information.
Use cases for computer vision include: autonomous vehicles, facial recognition, inventory management, medical imaging and more.
Are you interested in speaking at a future Meetup?
Is your company interested in sponsoring a Meetup?
Contact the Meetup organizers!
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone
Upcoming events (4+)
Link visible for attendees
* Redefining State-of-the-Art with YOLOv5 and YOLOv8 - Glenn Jocher (Ultralytics)
* Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation: Narek Tumanyan & Michal Geyer (Weizmann Institute of Science)
* Re-annotating MS COCO, An Exploration of Pixel Tolerance - Jerome Pasquero & Eric Zimmermann (Sama)
* Closing Remarks
Redefining State-of-the-Art with YOLOv5 and YOLOv8
In recent years, object detection has been one of the most challenging and demanding tasks in computer vision. YOLO (You Only Look Once) has become one of the most popular and widely used algorithms for object detection due to its fast speed and high accuracy. YOLOv5 and YOLOv8 are the latest versions of this algorithm released by Ultralytics, which redefine what "state-of-the-art" means in object detection. In this talk, we will discuss the new features of YOLOv5 and YOLOv8, which include a new backbone network, a new anchor-free detection head, and a new loss function. These new features enable faster and more accurate object detection, segmentation, and classification in real-world scenarios. We will also discuss the results of the latest benchmarks and show how YOLOv8 outperforms the previous versions of YOLO and other state-of-the-art object detection algorithms. Finally, we will discuss the potential for this technology to "do good” in real-world scenarios and across various fields, such as autonomous driving, surveillance, and robotics.
Glenn Jocher is founder and CEO of Ultralytics. In 2014 Glenn founded Ultralytics to lead the United States National Geospatial-Intelligence Agency (NGA) antineutrino analysis efforts, culminating in the miniTimeCube experiment and the world's first-ever Global Antineutrino Map published in Nature. Today he's driven to build the world's best vision AI as a building block to a future AGI, and YOLOv5, YOLOv8, and Ultralytics HUB are the spearheads of this obsession.
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts. However, a pivotal challenge in leveraging such models for real-world content creation tasks is providing users with control over the generated content. In this paper, we present a new framework that takes text-to-image synthesis to the realm of image-to-image translation -- given a guidance image and a target text prompt, our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text, while preserving the semantic layout of the source image. Specifically, we observe and empirically demonstrate that fine-grained control over the generated structure can be achieved by manipulating spatial features and their self-attention inside the model.
Michal Geyer and Narek Tumanya are Masters students at the Weizmann Institute of Science in the Computer Vision department.
Re-annotating MS COCO, An Exploration of Pixel Tolerance
The release of the COCO dataset has served as a foundation for many computer vision tasks including object and people detection. In this session, we’ll introduce the Sama-Coco dataset, a re-annotated version of COCO focused on fine-grained annotations. We’ll also cover interesting insights and learnings during the annotation phase, illustrative examples, and results of some of our experiments on annotation quality as well as how changes in labels affect model performance and prediction style.
Jerome Pasquero is Principal Product Manager at Sama. Jerome holds a Ph.D. in electrical engineering and is listed as inventor on more than 120 US patents along with published over 10 peer-reviewed journal and conference articles. Eric Zimmermann is an Applied Scientist at Sama helping to redefine annotation quality guidelines. He is also responsible for building internal curation tools which aim to improve the process on how clients and annotators interact with their data.
- Serena L.
- Jimmy Guerrero - V.
- 3 attendees from this group
Link visible for attendees
About the Workshop
Want greater visibility into the quality of your computer vision datasets and models? Then join Jacob Marks, PhD, of Voxel51 for this free 90 minute, hands-on workshop to learn how to leverage the open source FiftyOne computer vision toolset.
In the first part of the workshop we’ll cover:
- FiftyOne Basics (terms, architecture, installation, and general usage)
- An overview of useful workflows to explore, understand, and curate your data
- How FiftyOne represents and semantically slices unstructured computer vision data
The second half will be a hands-on introduction to FiftyOne, where you will learn how to:
- Load datasets from the FiftyOne Dataset Zoo
- Navigate the FiftyOne App
- Programmatically inspect attributes of a dataset
- Add new sample and custom attributes to a dataset
- Generate and evaluate model predictions
- Save insightful views into the data
A working knowledge of python and basic computer vision. All attendees will get access to the tutorials, videos, and code examples used in the workshop.
Link visible for attendees
Unleashing the Potential of Visual Data: Vector Databases in Computer Vision
Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.
Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.
Computer Vision Applications at Scale with Vector Databases
Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.
Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.
Reverse Image Search for Ecommerce Without Going Crazy
Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!
Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.
Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale
Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!
Jai Chopra is Head of Product at LanceDB
How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant
Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data..
Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.
- Jimmy Guerrero - V.
- 1 attendee from this group
Link visible for attendees
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg
This talk proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime.
Zhixi Cai is a Ph.D. student in the Data Science and Artificial Intelligence Department of Monash University IT Faculty, supervised by Dr. Munawar Hayat, Dr. Kalin Stefanov, and Dr. Abhinav Dhall. My research interests include computer vision, deepfake, and affective computing.