Data-efficient Visual Understanding via Deep Neural Networks

Publication Type:
Thesis
Issue Date:
2022
Full metadata record
Despite the empirical and preliminary successes in computer vision, deep neural networks often require large-scale annotated training datasets. When applied to complex visual understanding problems in the real world, their performance is limited, since both data and annotations can be notoriously costly to collect or may exist in various noisy or imperfect forms. Further, data annotating in such applications is also tedious to scale up, which demands highly skilled professionals, introducing challenges to use the cost-effective solutions, e.g., crowdsourcing. Even worse, additional annotated data is always desired when the trained models need to be accordingly adapted to the dynamically changing environments. Thus, both the academic and industrial communities are calling for data-efficient deep learning algorithms. In this thesis, we address the grand challenge of data-efficient and label-efficient visual understanding in realistic and imperfect real-world environments. To address this issue, we investigate deep learning approaches to leverage low-quantity training data and low-quality imperfect annotations. We propose a comprehensive suite of state-of-the-art approaches to tackle the data-efficient visual understanding from three directions, including (1) applying low-shot learning paradigms that are intrinsically data-efficient, e.g., few-shot learning or zero-shot learning. (2) exploiting imperfect labeled data to enable learning with noise. (3) transferring prior knowledge from the data-abundant domain into the data-hungry one. To demonstrate the effectiveness and efficiency of representative computer vision applications, extensive experiments are conducted on several dense prediction tasks, e.g., human parsing, scene parsing, semantic segmentation, and face super-resolution.
Please use this identifier to cite or link to this item: