Oct 15 - Visual AI in Agriculture (Day 1)

Details
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture.
Date and Time
Oct 15 at 9 AM Pacific
Location
Virtual. Register for the Zoom.
Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception
Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort.
The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism.
Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans.
For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models.
About the Speaker
Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere, where he develops deep learning models for LiDAR and RGB perception in safety-critical, real-time systems. He earned his Ph.D. in Computer Science from Auburn University, with a dissertation on improving computer vision and spatiotemporal deep neural networks, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind, Google, Meta, Microsoft, and OpenAI, among others, and his (batter|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr, and his GitHub repositories—which have collectively received over 2,100 stars—have served as starting points for research and production code at many different organizations.
MothBox: inexpensive, open-source, automated insect monitor
Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide.
About the Speaker
Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks.
Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington.
Foundation Models for Visual AI in Agriculture
Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos.
To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip.
However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations.
About the Speaker
Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016.
His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award.
Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision
Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery.
The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains.
About the Speaker
Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Oct 15 - Visual AI in Agriculture (Day 1)