Generative and Discriminative Learning for Visual Matching

Zheng, Zhedong

Generative and Discriminative Learning for Visual Matching

Zheng, Zhedong

Permalink

Publication Type:: Thesis
Issue Date:: 2021

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (180.97 kB)

Adobe PDF

Download thesisAdobe PDF (27.02 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zheng, Zhedong
dc.date.accessioned	2021-10-05T02:32:27Z
dc.date.available	2021-10-05T02:32:27Z
dc.date.issued	2021
dc.identifier.uri	http://hdl.handle.net/10453/150837
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Visual matching aims to establish image correspondences across viewpoints. Given a query image, the visual matching system seeks to retrieve images containing the object of interest from non-overlapping viewpoints according to the similarity score. The visual matching task remains challenging because objects captured by different viewpoints often contain significant intra-class variations caused by background, viewpoint, object pose, etc. In this thesis, I present my research on combining generative learning with discriminative learning to build one robust visual matching system. First, due to lack of sufficient data to enhance robustness against input variations, generative learning is aimed at letting the model potentially “see” these variations (particularly intra-class variations) during training. With recent progress in the generative adversarial networks (GANs), generative models have become appealing choices to introduce additional augmented data for free. Second, discriminative learning is designed to formulate visual matching as a metric learning problem and adopt the discriminative optimization objective to learn the distance. With these objectives in mind, it motivates us to enable Convolutional Neural Network (CNN) to learn the mapping function to discriminate between different objects. In this thesis, I investigate two scientific problems of combining two learning strategies: 1) How to obtain high-quality generated data for subsequential training? 2) How to leverage the generated data to promote discriminative learning? To study the two problems, I explore improving learned visual representations by better leveraging the data from the following three aspects. First, we present a semi-supervised pipeline that integrates GAN-generated images into discriminative learning. Second, we observe that the generative pipelines are typically presented as standalone models, which are relatively separate from the discriminative learning models. To make the best of the two worlds, we further propose a learning framework that couples discriminative and generative learning. Third, we further investigate different discriminative learning approaches on various data sources. Specifically, we study the feasibility of borrowing the knowledge from real-world vehicle images collected on the web and propose a two-stage learning strategy to minimize the domain gap between the web data and real-world data. Furthermore, we also explore the possibility of learning from synthetic data simulated by 3D engines. In summary, this thesis studies and solves the critical challenges of data limitation and robust representation learning in visual matching. We show the benefits of leveraging the generative and discriminative learning in deep learning, which achieves better performance than previous methods.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/150837/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Generative and Discriminative Learning for Visual Matching	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Visual matching aims to establish image correspondences across viewpoints. Given a query image, the visual matching system seeks to retrieve images containing the object of interest from non-overlapping viewpoints according to the similarity score. The visual matching task remains challenging because objects captured by different viewpoints often contain significant intra-class variations caused by background, viewpoint, object pose, etc. In this thesis, I present my research on combining generative learning with discriminative learning to build one robust visual matching system. First, due to lack of sufficient data to enhance robustness against input variations, generative learning is aimed at letting the model potentially “see” these variations (particularly intra-class variations) during training. With recent progress in the generative adversarial networks (GANs), generative models have become appealing choices to introduce additional augmented data for free. Second, discriminative learning is designed to formulate visual matching as a metric learning problem and adopt the discriminative optimization objective to learn the distance. With these objectives in mind, it motivates us to enable Convolutional Neural Network (CNN) to learn the mapping function to discriminate between different objects. In this thesis, I investigate two scientific problems of combining two learning strategies: 1) How to obtain high-quality generated data for subsequential training? 2) How to leverage the generated data to promote discriminative learning? To study the two problems, I explore improving learned visual representations by better leveraging the data from the following three aspects. First, we present a semi-supervised pipeline that integrates GAN-generated images into discriminative learning. Second, we observe that the generative pipelines are typically presented as standalone models, which are relatively separate from the discriminative learning models. To make the best of the two worlds, we further propose a learning framework that couples discriminative and generative learning. Third, we further investigate different discriminative learning approaches on various data sources. Specifically, we study the feasibility of borrowing the knowledge from real-world vehicle images collected on the web and propose a two-stage learning strategy to minimize the domain gap between the web data and real-world data. Furthermore, we also explore the possibility of learning from synthetic data simulated by 3D engines. In summary, this thesis studies and solves the critical challenges of data limitation and robust representation learning in visual matching. We show the benefits of leveraging the generative and discriminative learning in deep learning, which achieves better performance than previous methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/150837