High-quality depth maps acquisition for RGB-D data

Publication Type:
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
01front.pdf108 kB
Adobe PDF
02whole.pdf2.07 MB
Adobe PDF
With the developing of computer vision, the problem of high-quality depth map acquisition is demanding urgent solution. Generally, the methods for dense depth map acquisition consist of two categories: passive and active. The passive methods based on stereo matching algorithms always compute matching cost volume pixel by pixel, which is time-consuming. This thesis firstly proposes a local depth estimation method using adaptive matching scheme. Furthermore, with the help of affine invariant feature, the performance for matching in textureless regions is improved. Experimental results show that the proposed method can achieve better or comparable performances than the state-of-the-art method in the category of local methods, even with the less running time. In addition, since the depth map is estimated frame by frame, the temporal consistency cannot be guaranteed. This thesis proposes a method to enhance temporal consistency by applying adaptive temporal filtering, which explicitly considers the reliability of depth and the moving attribute of regions. Experiments demonstrate that the proposed algorithm can generate more stable depth sequences and effectively suppress the transient depth errors when rendering virtual images. Due to the inherent drawbacks of stereo matching, the depth map captured by sensors is more robust, especially for the textureless regions. However, it either suffers from low resolution, or has some holes on the depth map. Active methods are to solve these problems. Since low-quality depth map is always captured with a high-quality color or intensity image and they can be registered with each other on the same coordinate system, low-quality depth map can be refined by using the guidance from such high-quality color/intensity image. This type of active method is called guided depth map enhancement. In consideration of clear expression, this thesis uses color image to stand for color/intensity image in the rest of thesis. The meaning of it is according to the context. The methods on guided depth map enhancement can be classified into different categories depending on whether external training data is used. Without relying on the external datasets, co-occurrence property between edges on the depth map and the corresponding color image is explicitly exploited. However, because the assumption above is not always true, it leads to texture-copy artifacts and blurring depth edges. Markov-Random-Field-based (MRF-based) methods are popular in guided depth map enhancement. The state-of-the-art solutions are to adjust the affinities of the regularization term in MRF energy function. Actually, these existing methods are lack of explicit evaluation model to quantitatively measure the inconsistency between the depth edges and the corresponding color edges, so they cannot adaptively control the efforts of the guidance from the color image for depth enhancement. In addition, widely used affinity computing scheme for regularization term is based on the depth and color differences between neighbour pixels, which ignores local structure on the depth map. In this thesis, three algorithms are proposed to address the problems above. The first one aims to mitigate artifacts caused by edge misalignment between the depth map and the color image via hard-decision inconsistency checking pixel by pixel. The second one uses a structural quantitative measurement on edges inconsistency which is a soft-decision method. It is more accurate than its hard-decision counterpart above. The third one is to combine such soft-decision edge inconsistency measurement and local structure of the depth map which is modeled on Minimum Spanning Trees (Forest) to acquire more robust depth map. These methods are tested on Middlebury, ToF-Mark and NYU datasets which prove progressive improvements. In addition to the handcraft models for depth map enhancement, data-driven models are expected to implicitly learning such guidance to obtain superior performances. In this thesis, an end-to-end training method based on convolutional neural network is proposed, which borrows many concepts from existing models, e.g., batch-normalization and residual learning. It upsamples low-resolution depth map progressively and the residual network is constructed to learn high frequency component in multiple scales. This coarse-to-fine scheme can reconstruct high-resolution depth via multi-frequency synthesis. Experimental results show improvement in subjective evaluation and objective evaluation compared with state-of-the-art methods.
Please use this identifier to cite or link to this item: