Skip to content

Details

Zoom Link

https://voxel51.com/computer-vision-events/june-computer-vision-meetup-2023/

Agenda

* Housekeeping
* Redefining State-of-the-Art with YOLOv5 and YOLOv8 - Glenn Jocher (Ultralytics)
* Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation: Narek Tumanyan & Michal Geyer (Weizmann Institute of Science)
* Re-annotating MS COCO, An Exploration of Pixel Tolerance - Jerome Pasquero & Eric Zimmermann (Sama)
* Closing Remarks

Redefining State-of-the-Art with YOLOv5 and YOLOv8

In recent years, object detection has been one of the most challenging and demanding tasks in computer vision. YOLO (You Only Look Once) has become one of the most popular and widely used algorithms for object detection due to its fast speed and high accuracy. YOLOv5 and YOLOv8 are the latest versions of this algorithm released by Ultralytics, which redefine what "state-of-the-art" means in object detection. In this talk, we will discuss the new features of YOLOv5 and YOLOv8, which include a new backbone network, a new anchor-free detection head, and a new loss function. These new features enable faster and more accurate object detection, segmentation, and classification in real-world scenarios. We will also discuss the results of the latest benchmarks and show how YOLOv8 outperforms the previous versions of YOLO and other state-of-the-art object detection algorithms. Finally, we will discuss the potential for this technology to "do good” in real-world scenarios and across various fields, such as autonomous driving, surveillance, and robotics.

Speaker

Glenn Jocher is founder and CEO of Ultralytics. In 2014 Glenn founded Ultralytics to lead the United States National Geospatial-Intelligence Agency (NGA) antineutrino analysis efforts, culminating in the miniTimeCube experiment and the world's first-ever Global Antineutrino Map published in Nature. Today he's driven to build the world's best vision AI as a building block to a future AGI, and YOLOv5, YOLOv8, and Ultralytics HUB are the spearheads of this obsession.

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts. However, a pivotal challenge in leveraging such models for real-world content creation tasks is providing users with control over the generated content. In this paper, we present a new framework that takes text-to-image synthesis to the realm of image-to-image translation -- given a guidance image and a target text prompt, our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text, while preserving the semantic layout of the source image. Specifically, we observe and empirically demonstrate that fine-grained control over the generated structure can be achieved by manipulating spatial features and their self-attention inside the model.

Speakers

Michal Geyer and Narek Tumanya are Masters students at the Weizmann Institute of Science in the Computer Vision department.

Re-annotating MS COCO, An Exploration of Pixel Tolerance

The release of the COCO dataset has served as a foundation for many computer vision tasks including object and people detection. In this session, we’ll introduce the Sama-Coco dataset, a re-annotated version of COCO focused on fine-grained annotations. We’ll also cover interesting insights and learnings during the annotation phase, illustrative examples, and results of some of our experiments on annotation quality as well as how changes in labels affect model performance and prediction style.

Speakers

Jerome Pasquero is Principal Product Manager at Sama. Jerome holds a Ph.D. in electrical engineering and is listed as inventor on more than 120 US patents along with published over 10 peer-reviewed journal and conference articles. Eric Zimmermann is an Applied Scientist at Sama helping to redefine annotation quality guidelines. He is also responsible for building internal curation tools which aim to improve the process on how clients and annotators interact with their data.

Artificial Intelligence
Computer Vision
Machine Learning
Data Science
Open Source

Members are also interested in