San Diego ACM SIGGRAPH Message Board ACM and ACM SIGGRAPH › An ACM article; Modeling People and Places with Internet Photo Collections

An ACM article; Modeling People and Places with Internet Photo Collections

Sandro A.
sandroalberti
Group Organizer
Del Mar, CA
Post #: 85
Over the last couple years, methods for making 3d models out of photographs have grown from mere hypothesis, to automated matching of entire online photo collections. In the future, we might be even be able to compare recreated 3d models over time.

Modeling People and Places with Internet Photo Collections
David Crandall; 6-2012


Geotagged photographs can be used to identify the most photographed places on Earth... At a local scale, we can build detailed three-dimensional models of a scene by combining information from thousands of two-dimensional photographs taken by different people and from different vantage points...



While users of [social media] are primarily motivated by a desire to share photos with family and friends, collectively they are generating vast repositories of online information about the world and its people. Each of these photos is a visual observation of what a small part of the world looked like at a particular point in time and space...

Algorithms for analyzing the world at a global scale can automatically create annotated world maps by finding the most photographed cities and landmarks, inferring place names from text tags, and analyzing the images themselves to identify "canonical" images to summarize each place...

At a more local level, we can use automatic techniques from computer vision to produce strikingly accurate 3D models of a landmark...

[Involved are] two key research problems in computer science: extracting meaningful semantics from the raw data; and doing so efficiently...

[The figure below] presents an example of a visual connectivity network for a set of images of Trafalgar Square. We compute a measure of visual similarity between every pair of images and connect those above a threshold...



Mapping the World
In addition to the images themselves, modern photo-sharing sites such as Flickr collect a rich assortment of nonvisual information about photos... (what a photo contains -text tags-, as well as where -geotag-, when -time-stamp-, and how -camera metadata such as exposure settings- the photo was taken)... Geotags record the latitude and longitude of the point on Earth where a photo was taken...

As one might expect, more photographs are taken in some locations than others... Photo-taking is dense in urban areas and quite sparse in most rural areas... Continental boundaries are quite sharp, because beaches are such popular locales to take photos. Also... roads are visible... because people take photos as they travel...

We can find place names by looking across the photos of millions of users and finding tags that are used frequently in a particular place and infrequently outside of it. We can also generate a visual description of each place by finding a representative image that summarizes that place well... We connect pairs of photos having a high degree of visual similarity. Then we apply a graph-clustering algorithm to find... groups of nodes that are connected to many other nodes within the group but not to many nodes outside the graph... To decide which nodes to connect, we measure visual similarity using an automated technique called SIFT (scale-invariant feature transform)... (our two eyes view a scene from slightly different perspectives, and from these two views the brain can infer the depth of each scene point based on the difference between where the point appears in the two images)... We first perform SIFT feature matching between pairs of images to build an image network. Unrelated images, such as a closeup of a pigeon, are automatically discarded...

Given an input photo... SIFT extracts a number of features, consisting of salient locations and scales in the image (below), as well as a high-dimensional descriptor summarizing the appearance of each feature. A subset of detected feature locations, depicted as yellow circles, is superimposed on the image... The image is shown again on the bottom next to an image from a similar viewpoint...





Once we have the network of visual connectivity between images, we need to estimate the precise position and orientation of the cameras...

Information in the visual network, as well as absolute location information from geotags can help with this reconstruction task. Consider a pair of visually overlapping images.. We can determine the geometric relationship between these two images—that image 2 is taken to the left of image 1 and rotated slightly clockwise... By computing many such relationships, we can build up a network of information on top of a set of images...

Unfortunately, geotags are very noisy and can at times be hundreds of meters away from a photo's true location. On the other hand, some geotags are quite accurate. If we knew which were good, we could propagate locations from those photos to their neighbors in the network...

To overcome this problem we have developed a new technique that uses the image network to... "average out" errors in the noisy observations... Each image repeatedly updates its position based on information from its neighbors...

While photo-sharing sites such as Flickr and Facebook continue to grow at a breathtaking pace, they still do not have enough images to reach our eventual goal of reconstructing the entire world in 3D. The main problem is that the geospatial distribution of photographs is highly nonuniform... (there are hundreds of thousands of photos of Notre Dame but virtually none of the café across the street). One solution to this problem is to entice people to take photos of under-represented places through gamification. This is the idea behind PhotoCity... In PhotoCity, teams of players compete against one another by taking photos at specific points in space... Through this game, we collected more than 100,000 photos of the Cornell and University of Washington campuses over a period of a few weeks. We used these photos to reconstruct... areas that otherwise did not have much photographic coverage on sites such as Flickr...

Future Work
Imagine all of the world's photos as coming from a "distributed camera," continually capturing images all around the world. Can this camera be calibrated to estimate the place and time each of these photos was taken?... This would, for example, allow a scientist to find all images of Central Park over time in order to study changes in flowering times from year to year, or would allow an engineer to find all available photos of a particular bridge online to determine why it collapsed...


See more ACM articles online
Sandro A.
sandroalberti
Group Organizer
Del Mar, CA
Post #: 217
This is a great new article for you, fresh from the Communications of the ACM!
(August 27th- 2012)
Sandro A.
sandroalberti
Group Organizer
Del Mar, CA
Post #: 226
3d Photo Stitching Tech could also be quite... malicious:

PlaceRaider malware takes secret pics from your phone, uploads to server as 3d space for evildoers to examine! http://bit.ly/SQpSwA...­
Powered by mvnForum

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy