Enhancing image quality is a classical image processing problem that has received plenty of attention over the past several decades. A high-quality image is always expected in various vision tasks, and degradations such as noise, low-resolution, and blur are required to be removed. While the conventional techniques for this task have achieved great progress, the recent top performer, deep models, can substantially and significantly boost performance compared with conventional ones. The advantages of deep learning which enables it to achieve such success are its high representational capacity and the strong nonlinearity of the models. In this thesis, we explore the development of advanced deep models for image quality enhancement by researching several fundamental issues with different motivations.
In particular, we are first motivated by a pivotal property of the human perceptual system that similar visual cues can stimulate the same neuron to induce similar neurological signals. However, image degradations can result in the fact that similar local structures in images exhibiting dissimilar observations. While the conventional neural networks do not consider this important property, we develop the (stacked) non-local auto-encoder which exploits self-similar information in natural images for enhancing the stability of signal propagation in the network. It is expected that similar structures should induce similar network propagation. This is achieved by constraining the difference between the hidden representations of non-local similar image blocks during training. By applying the proposed model to image restoration, we then develop a “collaborative stabilisation” step to further rectify forward propagation.
When applying deep models to image quality enhancement tasks, we are concerned about which factor, receptive field size or model depth, is more critical. To determine the answer, we focus on the single image super-resolution task, and propose a strategy based on dilated convolution to investigate how the two factors affect the performance. Our findings from exhaustive investigations suggest that single image super-resolution is more sensitive to the changes of receptive field size than to model depth variations, and that the model depth must be congruent with the receptive field size to produce improved performance. These findings inspire us to design a shallower architecture which can save computational and memory cost while preserving comparable effectiveness with respect to a much deeper architecture.
Finally, we study the general non-blind image deconvolution problem. It is observed in practice that by using existing deconvolution techniques, the residual between the sharp image and the estimation is highly dependent on both the sharp image and the noise. These techniques require the construction of different restoration models for different blur kernels and noises, inducing low computational efficiency or highly redundant model parameters. Thus, for general purposes, we propose a method by designing a very deep convolutional neural network which can handle different kernels and noises, while preserving high effectiveness and efficiency. Instead of directly outputting the deconvolved results, the model predicts the residual between a pre-deconvolved image and the corresponding sharp image, which can make the training easier and obtain restored images with suppressed artifacts.