Learning for Object Localization with Imperfect Data

Zhang, Xiaolin

Learning for Object Localization with Imperfect Data

Zhang, Xiaolin

Permalink

Publication Type:: Thesis
Issue Date:: 2021

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (204.16 kB)

Adobe PDF

Download thesisAdobe PDF (5.16 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Xiaolin
dc.date.accessioned	2021-10-05T03:20:34Z
dc.date.available	2021-10-05T03:20:34Z
dc.date.issued	2021
dc.identifier.uri	http://hdl.handle.net/10453/150838
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Deep learning has achieved countless remarkable successes in recent years. Learning deep neural networks usually needs tremendous well-labeled examples, which requires intensive investments. A feasible solution for reducing the budget is to learn from imperfect data, e.g., noisy data, synthetic data, weak labels, and datasets with few annotated examples. This thesis dedicates to the weakly supervised learning and few-shot learning. The first task is to address the challenging object localization problem using weak annotations as supervision. Objects in images are expected to be precisely located with only image-level labels, i.e., category information. Specifically, convolutional networks can only find the most discriminative object regions leading to the unsatisfied predictions of bounding boxes. This thesis tries to solve this problem in three perspectives: 1) forcing the networks to mine more object areas by erasing the discovered object pixels; 2) learning pixel correlations within images under the supervision of self-produced object masks ; 3) communicating with different images to obtain more consistent features, and therefore, activating target object more accurately. The second task is to predict the semantic masks of objects in a few-shot approach. Finding every pixel of target objects can also be considered as the most delicate localization problem. In the few-shot regime, only few annotated examples are available for an unseen class, and networks are required to locate the semantic category of each pixel with minimal information. This thesis will present two approaches to improve the quality of predicted object masks. Notably, a similarity-guided network is proposed to endow the segmentation process with rough position cues for locating the object pixels. To enhance the guidance process and improve the robustness, we further enrich the guidance embeddings and propose to employ multiple diverse support vectors to generate the similarity maps. In addition, each of the proposed methods is comprehensively verified and analyzed by conducting various experiments.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/150838/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Learning for Object Localization with Imperfect Data	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Deep learning has achieved countless remarkable successes in recent years. Learning deep neural networks usually needs tremendous well-labeled examples, which requires intensive investments. A feasible solution for reducing the budget is to learn from imperfect data, e.g., noisy data, synthetic data, weak labels, and datasets with few annotated examples. This thesis dedicates to the weakly supervised learning and few-shot learning. The first task is to address the challenging object localization problem using weak annotations as supervision. Objects in images are expected to be precisely located with only image-level labels, i.e., category information. Specifically, convolutional networks can only find the most discriminative object regions leading to the unsatisfied predictions of bounding boxes. This thesis tries to solve this problem in three perspectives: 1) forcing the networks to mine more object areas by erasing the discovered object pixels; 2) learning pixel correlations within images under the supervision of self-produced object masks ; 3) communicating with different images to obtain more consistent features, and therefore, activating target object more accurately. The second task is to predict the semantic masks of objects in a few-shot approach. Finding every pixel of target objects can also be considered as the most delicate localization problem. In the few-shot regime, only few annotated examples are available for an unseen class, and networks are required to locate the semantic category of each pixel with minimal information. This thesis will present two approaches to improve the quality of predicted object masks. Notably, a similarity-guided network is proposed to endow the segmentation process with rough position cues for locating the object pixels. To enhance the guidance process and improve the robustness, we further enrich the guidance embeddings and propose to employ multiple diverse support vectors to generate the similarity maps. In addition, each of the proposed methods is comprehensively verified and analyzed by conducting various experiments.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/150838