Learning Object Detection with Weak Supervision

Xu, Yunqiu

Learning Object Detection with Weak Supervision

Xu, Yunqiu

Permalink

Publication Type:: Thesis
Issue Date:: 2023

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (4.92 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, Yunqiu
dc.date.accessioned	2023-06-13T03:32:13Z
dc.date.available	2023-06-13T03:32:13Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/10453/170728
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Deep learning technique has achieved astonishing success in many computer vision applications. However, training deep models typically requires large-scale datasets with elaborate annotations. Collecting and annotating large-scale datasets are laborious, especially for object detection --- a challenging vision task. A promising solution for reducing costs is to train models with weak supervision, which provides a good trade-off between model performance and annotation efficiency. This thesis dedicates to weakly supervised learning in two object-centered application scenarios, i.e., general object detection and RGB-D salient object detection. The first task is to predict the category of an object and its location in the given image with image-level weak supervision. A pyramidal multiple instance detection network is first introduced to reduce the exposure of local discriminative proposal regions, alleviating the local optimum issue in training detectors with only image-level annotations. Besides learning detectors with only image-level supervision, two more practical scenarios in weakly supervised object detection are considered. With a well-annotated object detection dataset, this thesis further investigates how to scale detectors to novel domains or categories using weak supervision. Concretely, a holistic and hierarchical feature alignment R-CNN is presented to perform coarse-to-fine alignments in pace with the detection pipeline and effectively reduce the discrepancy between different domains with weak supervision. A cyclic self-training framework with a proposal weight modulation module is introduced to compensate for the instance-level supervision of novel classes and adaptively adjust loss weights for the training samples. The second task is to predict pixel-level masks for the foreground objects in paired RGB-D inputs (i.e., images and depth maps) with scribble-based weak supervision. This thesis explores annotator-friendly scribble annotations for training models. A dual-modal edge-guided network and a prediction consistency training method are developed to fully take advantage of the complementary information from both modalities and exploit the information residing in the unlabeled pixels, respectively. Extensive experiments are conducted and analyzed to evaluate the effectiveness of the proposed approaches with weak supervision. Competitive performance on commonly used benchmarks verifies the effectiveness and universality.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/170728/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	@ 2023 Yunqiu Xu
dc.rights	au.edu.uts.lib/cph
dc.title	Learning Object Detection with Weak Supervision	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Deep learning technique has achieved astonishing success in many computer vision applications. However, training deep models typically requires large-scale datasets with elaborate annotations. Collecting and annotating large-scale datasets are laborious, especially for object detection --- a challenging vision task. A promising solution for reducing costs is to train models with weak supervision, which provides a good trade-off between model performance and annotation efficiency. This thesis dedicates to weakly supervised learning in two object-centered application scenarios, i.e., general object detection and RGB-D salient object detection. The first task is to predict the category of an object and its location in the given image with image-level weak supervision. A pyramidal multiple instance detection network is first introduced to reduce the exposure of local discriminative proposal regions, alleviating the local optimum issue in training detectors with only image-level annotations. Besides learning detectors with only image-level supervision, two more practical scenarios in weakly supervised object detection are considered. With a well-annotated object detection dataset, this thesis further investigates how to scale detectors to novel domains or categories using weak supervision. Concretely, a holistic and hierarchical feature alignment R-CNN is presented to perform coarse-to-fine alignments in pace with the detection pipeline and effectively reduce the discrepancy between different domains with weak supervision. A cyclic self-training framework with a proposal weight modulation module is introduced to compensate for the instance-level supervision of novel classes and adaptively adjust loss weights for the training samples. The second task is to predict pixel-level masks for the foreground objects in paired RGB-D inputs (i.e., images and depth maps) with scribble-based weak supervision. This thesis explores annotator-friendly scribble annotations for training models. A dual-modal edge-guided network and a prediction consistency training method are developed to fully take advantage of the complementary information from both modalities and exploit the information residing in the unlabeled pixels, respectively. Extensive experiments are conducted and analyzed to evaluate the effectiveness of the proposed approaches with weak supervision. Competitive performance on commonly used benchmarks verifies the effectiveness and universality.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/170728