Data-efficient Visual Understanding via Deep Neural Networks

Li, Peike

Data-efficient Visual Understanding via Deep Neural Networks

Li, Peike

Permalink

Publication Type:: Thesis
Issue Date:: 2022

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (393.66 kB)

Adobe PDF

Download thesisAdobe PDF (3.06 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, Peike
dc.date.accessioned	2022-11-03T23:35:00Z
dc.date.available	2022-11-03T23:35:00Z
dc.date.issued	2022
dc.identifier.uri	http://hdl.handle.net/10453/163222
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Despite the empirical and preliminary successes in computer vision, deep neural networks often require large-scale annotated training datasets. When applied to complex visual understanding problems in the real world, their performance is limited, since both data and annotations can be notoriously costly to collect or may exist in various noisy or imperfect forms. Further, data annotating in such applications is also tedious to scale up, which demands highly skilled professionals, introducing challenges to use the cost-effective solutions, e.g., crowdsourcing. Even worse, additional annotated data is always desired when the trained models need to be accordingly adapted to the dynamically changing environments. Thus, both the academic and industrial communities are calling for data-efficient deep learning algorithms. In this thesis, we address the grand challenge of data-efficient and label-efficient visual understanding in realistic and imperfect real-world environments. To address this issue, we investigate deep learning approaches to leverage low-quantity training data and low-quality imperfect annotations. We propose a comprehensive suite of state-of-the-art approaches to tackle the data-efficient visual understanding from three directions, including (1) applying low-shot learning paradigms that are intrinsically data-efficient, e.g., few-shot learning or zero-shot learning. (2) exploiting imperfect labeled data to enable learning with noise. (3) transferring prior knowledge from the data-abundant domain into the data-hungry one. To demonstrate the effectiveness and efficiency of representative computer vision applications, extensive experiments are conducted on several dense prediction tasks, e.g., human parsing, scene parsing, semantic segmentation, and face super-resolution.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/163222/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Data-efficient Visual Understanding via Deep Neural Networks	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Despite the empirical and preliminary successes in computer vision, deep neural networks often require large-scale annotated training datasets. When applied to complex visual understanding problems in the real world, their performance is limited, since both data and annotations can be notoriously costly to collect or may exist in various noisy or imperfect forms. Further, data annotating in such applications is also tedious to scale up, which demands highly skilled professionals, introducing challenges to use the cost-effective solutions, e.g., crowdsourcing. Even worse, additional annotated data is always desired when the trained models need to be accordingly adapted to the dynamically changing environments. Thus, both the academic and industrial communities are calling for data-efficient deep learning algorithms. In this thesis, we address the grand challenge of data-efficient and label-efficient visual understanding in realistic and imperfect real-world environments. To address this issue, we investigate deep learning approaches to leverage low-quantity training data and low-quality imperfect annotations. We propose a comprehensive suite of state-of-the-art approaches to tackle the data-efficient visual understanding from three directions, including (1) applying low-shot learning paradigms that are intrinsically data-efficient, e.g., few-shot learning or zero-shot learning. (2) exploiting imperfect labeled data to enable learning with noise. (3) transferring prior knowledge from the data-abundant domain into the data-hungry one. To demonstrate the effectiveness and efficiency of representative computer vision applications, extensive experiments are conducted on several dense prediction tasks, e.g., human parsing, scene parsing, semantic segmentation, and face super-resolution.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/163222