Indoor Scene Understanding via Parse Graphs
Abstract:
The aim of this project is to explore a new approach to scene understanding from a single image. Rather than attempt to extract the precise 3D structure of the scene, the goal is to identify contextual and spatial relationships between different objects in the scene. From this we can estimate an approximate geometric layout that is sufficient for most applications, particularly navigation.
Many techniques for this purpose use a pipeline of image segmentation, followed by classification of individual segments, and then derive some understanding of the scene. Instead of a pipeline approach where each step is independent of the other(s), we will use an iterative algorithm whereby the segmentation, classification, and scene understanding processes all assist one another.
To accomplish this, we will be using a variation of the k-means clustering algorithm, where clusters are labeled at each iteration, and these labels influence how pixels are assigned to clusters on subsequent iterations. In addition, we will construct a parse graph whose nodes are all possible labels, and whose edges are all possible relationships between labels. At each iteration, we improve our understanding of the scene by setting the likelihoods of each label and relationship, eventually simplifying the graph to one that best fits the scene. Using this, we can approximate the depths and positions of objects relative to one another.
Many techniques for this purpose use a pipeline of image segmentation, followed by classification of individual segments, and then derive some understanding of the scene. Instead of a pipeline approach where each step is independent of the other(s), we will use an iterative algorithm whereby the segmentation, classification, and scene understanding processes all assist one another.
To accomplish this, we will be using a variation of the k-means clustering algorithm, where clusters are labeled at each iteration, and these labels influence how pixels are assigned to clusters on subsequent iterations. In addition, we will construct a parse graph whose nodes are all possible labels, and whose edges are all possible relationships between labels. At each iteration, we improve our understanding of the scene by setting the likelihoods of each label and relationship, eventually simplifying the graph to one that best fits the scene. Using this, we can approximate the depths and positions of objects relative to one another.
No comments:
Post a Comment