Recovering dense 3D motion and shape information from RGB-D data

Publication Type:
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
01front.pdf91.37 kB
Adobe PDF
02whole.pdf4.76 MB
Adobe PDF
3D motion and 3D shape information are essential to many research fields, such as computer vision, computer graphics, and augmented reality. Thus, 3D motion estimation and 3D shape recovery are two important topics in these research communities. RGB-D cameras have become more accessible in recent few years. They are popular for good mobility, low cost, and high frame rate. However, these RGB-D cameras generate low-resolution and low-accuracy depth images due to chip size limitations and ambient illumination perturbation. Thus, obtaining high-resolution and high-accuracy 3D information based on RGB-D data is an important task. This research investigates 3D motion estimation and 3D shape recovery solutions for RGB-D cameras. Thus, within this thesis, various methods are developed and presented to address the following research challenges: fusing passive stereo vision and active depth acquisition; 3D motion estimation based on RGB-D data; depth super-resolution based on RGB-D video with large displacement 3D motion. In Chapter 3, a framework is presented to acquire depth images by fusing active depth acquisition and passive stereo vision. Active depth acquisition and passive stereo vision have their limitations in some aspects, but their range-sensing characteristics are complementary. Thus, combining both approaches can produce more accurate results than using either one only. Unlike previous fusion methods, the noisy depth observation from active depth acquisition is initially taken as a prior knowledge of the scene structure, which improves the accuracy of the fused depth images. Chapter 4 details a method for 3D scene ow estimation based on RGB-D data. The accuracy of scene ow estimation is limited by two issues: occlusions and large displacement motions. To handle occlusions, the occlusion status is modelled, and the scene ow and occluded regions are jointly estimated. To deal with large displacement motions, an over-parameterised scene ow representation is employed to model both the rotation and translation components of the scene ow. In Chapter 5, a depth super-resolution framework is presented for RGB-D video sequences with large 3D motion. To handle large 3D motion, our framework has two stages: motion compensation and fusion. A superpixel-based motion estimation approach is proposed for efficient motion compensation. The fusion task is modelled as a regression problem, and a specific deep convolutional neural network (CNN) is designed that can learns the mapping function between depth image observations and the fused depth image given a large amount of training data.
Please use this identifier to cite or link to this item: