Today I read a paper titled “3D Hand Pose Detection in Egocentric RGB-D Images”
The abstract is:
We focus on the task of everyday hand pose estimation from egocentric viewpoints
For this task, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment
Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved problem
The problem is considerably exacerbated when analyzing hands performing daily activities from a first-person viewpoint, due to severe occlusions arising from object manipulations and a limited field-of-view
Our system addresses these difficulties by exploiting strong priors over viewpoint and pose in a discriminative tracking-by-detection framework
Our priors are operationalized through a photorealistic synthetic model of egocentric scenes, which is used to generate training data for learning depth-based pose classifiers
We evaluate our approach on an annotated dataset of real egocentric object manipulation scenes and compare to both commercial and academic approaches
Our method provides state-of-the-art performance for both hand detection and pose estimation in egocentric RGB-D images