High-quality depth maps acquisition for RGB-D data

Zuo, Yifan

High-quality depth maps acquisition for RGB-D data

Zuo, Yifan

Permalink

Publication Type:: Thesis
Issue Date:: 2018

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (108 kB)

Adobe PDF

Download thesisAdobe PDF (2.07 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zuo, Yifan
dc.date.accessioned	2018-10-01T22:53:47Z
dc.date.available	2018-10-01T22:53:47Z
dc.date.issued	2018
dc.identifier.uri	http://hdl.handle.net/10453/127890
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	With the developing of computer vision, the problem of high-quality depth map acquisition is demanding urgent solution. Generally, the methods for dense depth map acquisition consist of two categories: passive and active. The passive methods based on stereo matching algorithms always compute matching cost volume pixel by pixel, which is time-consuming. This thesis firstly proposes a local depth estimation method using adaptive matching scheme. Furthermore, with the help of affine invariant feature, the performance for matching in textureless regions is improved. Experimental results show that the proposed method can achieve better or comparable performances than the state-of-the-art method in the category of local methods, even with the less running time. In addition, since the depth map is estimated frame by frame, the temporal consistency cannot be guaranteed. This thesis proposes a method to enhance temporal consistency by applying adaptive temporal filtering, which explicitly considers the reliability of depth and the moving attribute of regions. Experiments demonstrate that the proposed algorithm can generate more stable depth sequences and effectively suppress the transient depth errors when rendering virtual images. Due to the inherent drawbacks of stereo matching, the depth map captured by sensors is more robust, especially for the textureless regions. However, it either suffers from low resolution, or has some holes on the depth map. Active methods are to solve these problems. Since low-quality depth map is always captured with a high-quality color or intensity image and they can be registered with each other on the same coordinate system, low-quality depth map can be refined by using the guidance from such high-quality color/intensity image. This type of active method is called guided depth map enhancement. In consideration of clear expression, this thesis uses color image to stand for color/intensity image in the rest of thesis. The meaning of it is according to the context. The methods on guided depth map enhancement can be classified into different categories depending on whether external training data is used. Without relying on the external datasets, co-occurrence property between edges on the depth map and the corresponding color image is explicitly exploited. However, because the assumption above is not always true, it leads to texture-copy artifacts and blurring depth edges. Markov-Random-Field-based (MRF-based) methods are popular in guided depth map enhancement. The state-of-the-art solutions are to adjust the affinities of the regularization term in MRF energy function. Actually, these existing methods are lack of explicit evaluation model to quantitatively measure the inconsistency between the depth edges and the corresponding color edges, so they cannot adaptively control the efforts of the guidance from the color image for depth enhancement. In addition, widely used affinity computing scheme for regularization term is based on the depth and color differences between neighbour pixels, which ignores local structure on the depth map. In this thesis, three algorithms are proposed to address the problems above. The first one aims to mitigate artifacts caused by edge misalignment between the depth map and the color image via hard-decision inconsistency checking pixel by pixel. The second one uses a structural quantitative measurement on edges inconsistency which is a soft-decision method. It is more accurate than its hard-decision counterpart above. The third one is to combine such soft-decision edge inconsistency measurement and local structure of the depth map which is modeled on Minimum Spanning Trees (Forest) to acquire more robust depth map. These methods are tested on Middlebury, ToF-Mark and NYU datasets which prove progressive improvements. In addition to the handcraft models for depth map enhancement, data-driven models are expected to implicitly learning such guidance to obtain superior performances. In this thesis, an end-to-end training method based on convolutional neural network is proposed, which borrows many concepts from existing models, e.g., batch-normalization and residual learning. It upsamples low-resolution depth map progressively and the residual network is constructed to learn high frequency component in multiple scales. This coarse-to-fine scheme can reconstruct high-resolution depth via multi-frequency synthesis. Experimental results show improvement in subjective evaluation and objective evaluation compared with state-of-the-art methods.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/127890/2/02whole.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.subject	Dense depth map acquisition.	en_AU
dc.subject	Temporal filtering.	en_AU
dc.subject	Depth map enhancement.	en_AU
dc.title	High-quality depth maps acquisition for RGB-D data	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

With the developing of computer vision, the problem of high-quality depth map acquisition is demanding urgent solution. Generally, the methods for dense depth map acquisition consist of two categories: passive and active. The passive methods based on stereo matching algorithms always compute matching cost volume pixel by pixel, which is time-consuming. This thesis firstly proposes a local depth estimation method using adaptive matching scheme. Furthermore, with the help of affine invariant feature, the performance for matching in textureless regions is improved. Experimental results show that the proposed method can achieve better or comparable performances than the state-of-the-art method in the category of local methods, even with the less running time. In addition, since the depth map is estimated frame by frame, the temporal consistency cannot be guaranteed. This thesis proposes a method to enhance temporal consistency by applying adaptive temporal filtering, which explicitly considers the reliability of depth and the moving attribute of regions. Experiments demonstrate that the proposed algorithm can generate more stable depth sequences and effectively suppress the transient depth errors when rendering virtual images. Due to the inherent drawbacks of stereo matching, the depth map captured by sensors is more robust, especially for the textureless regions. However, it either suffers from low resolution, or has some holes on the depth map. Active methods are to solve these problems. Since low-quality depth map is always captured with a high-quality color or intensity image and they can be registered with each other on the same coordinate system, low-quality depth map can be refined by using the guidance from such high-quality color/intensity image. This type of active method is called guided depth map enhancement. In consideration of clear expression, this thesis uses color image to stand for color/intensity image in the rest of thesis. The meaning of it is according to the context. The methods on guided depth map enhancement can be classified into different categories depending on whether external training data is used. Without relying on the external datasets, co-occurrence property between edges on the depth map and the corresponding color image is explicitly exploited. However, because the assumption above is not always true, it leads to texture-copy artifacts and blurring depth edges. Markov-Random-Field-based (MRF-based) methods are popular in guided depth map enhancement. The state-of-the-art solutions are to adjust the affinities of the regularization term in MRF energy function. Actually, these existing methods are lack of explicit evaluation model to quantitatively measure the inconsistency between the depth edges and the corresponding color edges, so they cannot adaptively control the efforts of the guidance from the color image for depth enhancement. In addition, widely used affinity computing scheme for regularization term is based on the depth and color differences between neighbour pixels, which ignores local structure on the depth map. In this thesis, three algorithms are proposed to address the problems above. The first one aims to mitigate artifacts caused by edge misalignment between the depth map and the color image via hard-decision inconsistency checking pixel by pixel. The second one uses a structural quantitative measurement on edges inconsistency which is a soft-decision method. It is more accurate than its hard-decision counterpart above. The third one is to combine such soft-decision edge inconsistency measurement and local structure of the depth map which is modeled on Minimum Spanning Trees (Forest) to acquire more robust depth map. These methods are tested on Middlebury, ToF-Mark and NYU datasets which prove progressive improvements. In addition to the handcraft models for depth map enhancement, data-driven models are expected to implicitly learning such guidance to obtain superior performances. In this thesis, an end-to-end training method based on convolutional neural network is proposed, which borrows many concepts from existing models, e.g., batch-normalization and residual learning. It upsamples low-resolution depth map progressively and the residual network is constructed to learn high frequency component in multiple scales. This coarse-to-fine scheme can reconstruct high-resolution depth via multi-frequency synthesis. Experimental results show improvement in subjective evaluation and objective evaluation compared with state-of-the-art methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127890