Creating a 3D World from 2D Images
Details
Have you ever tried Google’s Live View feature, where you just point your phone at a building and it instantly figures out exactly where you are? Or maybe you’ve wondered how action movies pull off those cool 3D scenes of the places they’re about to rob? Ever stopped to think about how virtual tours and street views come to life?
In this talk, we will explore the various algorithms that are used to create vivid and comprehensive 3D scenes from just a handful of images collected from the internet. The talk will be divided in 3 key sections:
Artifact Mitigation
Images collected from the internet are not always perfect. Some may have memes or text overlaid on them, while others may have been compressed multiple times for efficient storage, resulting in pixelation. Some images might be blurry, have too much sun exposure, or be taken at night. There can also be moving objects like people and cars, which aren’t needed and could obstruct the 3D reconstruction of a scene.
In this section, we’ll learn about the deep learning algorithms used to remove these kinds of artifacts and transient objects from images.
Image Registration and Geo-localization
Once the images have been pre-processed, the next step is to determine their relative pose with respect to each other. Imagine trying to figure out whether a photo was taken from the left side of the Eiffel Tower or the right, or whether the person with the camera was 100 meters away or just 50 meters. Sometimes, the images might even come from a drone! So how do we place all these different frames into a common reference frame? The more viewpoints we have of an area, the more complete our 3D models will be.
In this section, we’ll learn how Structure-from-Motion (SfM) is used to assign poses to these images. We’ll explore techniques for using background details to determine pose, especially when the object of interest looks the same from every angle. And finally, we’ll briefly discuss how these images can be geo-localized; meaning their latitude and longitude can be estimated, even when no GPS information is available.
3D Reconstruction
Now that our images are cleaned up and we know their poses, we can dive into the techniques used to transform these images to build a 3D scene. We’ll discuss, at a high level, some traditional 3D reconstruction methods along with more recent AI-based approaches such as neural rendering and gaussian splatting.
The applications of these computer vision and deep learning techniques we’ve talked about are widespread, and I hope you’ll start noticing them being used all around you, whether it’s in mobile phones, self-driving technology, biometric scanners, delivery robots, security cameras, and many other places.
Speaker Bio:
Kshitij Minhas works at the intersection of computer vision, robotics, and AI. In his current role at SRI, he works on multi-faceted projects tackling novel problems such as developing algorithms for autonomous drones, warehouse robot navigation, 3D scene reconstruction, precise human pose estimation for security, GPS-free robot localization, augmented reality mentoring systems, and automation for heavy construction equipment among others. Previously he worked at computer vision-based hardware startups where he was more focused on hardware design and deployment.
Kshitij brings end-to-end expertise, spanning algorithm development, system integration, and hardware design. He holds degrees in Electrical and Computer Engineering as well as Mechanical Engineering, has published work at premier IEEE conferences, and is co-inventor on 2 patents. Through this talk he hopes to pass on some of his learnings in computer vision and the latest trends in 3D reconstruction and AI to the general audience.
LinkedIn: https://www.linkedin.com/in/kshitijminhas
Agenda:
We will start our meeting at 7:00 pm. For the next 10 minutes or so, we will introduce ourselves, handle any LICN business and do a little networking. We will then start our presentation. After the presentation, feel free to stick around and chat with others to network or to further discuss our lecture topic.
NOTES
There is no cost to attend this meeting, however, if you are a NYS Professional Engineer and would like to receive Professional Development Hours (PDHs) of continuing education credit, then payment of a $15 fee is required. You will also have to properly fill out an Evaluation Form to prove that you attended this lecture.
Click here to open the form. Simply fill it out and click on the “Submit” button. PDHs will be granted based upon actual time of lecture including Q&A. You must stay to the end to receive credit.
We accept electronic payment via Zelle. Zelle is a bank-to-bank transfer mechanism supported by most banks, without a fee, as part of their normal online banking capabilities. There is also a Zelle app available for your smartphone.
When you use Zelle with your bank, it will ask for the following information: 1) the amount to send (enter $15.00), 2) what account you want to pull the money from (select whatever account you want to use), and 3) the phone number or email of the recipient (enter ieeelicn@gmail.com. Don’t worry if you see the name of our Treasurer, David Rost, pop up). If it asks for a memo field, we suggest entering "yymmdd LICN PDH” where yymmdd are the year, month and date of the lecture.
While we prefer that your payment and evaluation form are received by the day of the lecture, they must be received by the first Monday after the lecture.
If paying by Zelle is a problem for you, then please contact Ed Gellender at edgellender@gmail.com for an alternate payment method.