Today I read a paper titled “Labeling 3D scenes for Personal Assistant Robots”
The abstract is:
Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available
We use this data to build 3D point clouds of a full scene
In this paper, we address the task of labeling objects in this 3D point cloud of a complete indoor scene such as an office
We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurrence relationships and geometric relationships
With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials
The model admits efficient approximate inference, and we train it using a maximum-margin learning approach
In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes
Finally, we applied these algorithms successfully on a mobile robot for the task of finding an object in a large cluttered room