Statistical Methods for Out-of-distribution Detection

Zhao, Zhilin

Statistical Methods for Out-of-distribution Detection

Zhao, Zhilin

Permalink

Publication Type:: Thesis
Issue Date:: 2023

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (4.97 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, Zhilin
dc.date.accessioned	2023-09-06T03:35:04Z
dc.date.available	2023-09-06T03:35:04Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/10453/171946
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	For a network trained on in-distribution (ID) samples, test samples could be out-of-distribution (OOD) that are drawn from distributions different from that of ID samples. Accordingly, OOD detection aims to identify OOD samples in test phases. The main challenge lies in that a network could provide high-confidence predictions for OOD samples, which indicates that the network cannot distinguish ID and OOD samples. The main causes of the high-confidence issue include limited ID and unavailable OOD samples in training processes. One strategy to enhance the detection performance of a network is to make the outputs more sensitive to OOD samples, i.e., the network tends to provide high- and low-confidence predictions for ID and OOD samples, respectively. Improving the OOD sensitivity for a network requires to address a series of important problems and challenges: (1) Penalizing OOD samples with high-confidence predictions can improve the OOD sensitivity. Accordingly, how to generate specific OOD samples for a network? (2) If partial OOD samples are observed, how to involve them in the retraining process to balance the ID generalization and OOD detection? (3) If OOD samples are unavailable, how to fine-tune a network with augmented ID samples to improve the OOD sensitivity? (4) If modifying the network is not allowed, how to learn an auxiliary network to capture the OOD-sensitive information for the network? This thesis systematically studies how to effectively solve the aforementioned issues with experimental and theoretical support. Due to the significant difference between ID and OOD samples, it is essential to consider the data characteristics and data correlations that statistical methods can model. Accordingly, this thesis attempts to incorporate statistical methods into deep neural networks to improve the OOD sensitivity. Specifically, this thesis proposes four novel methods to address these issues. The main ideas include inferring an implicit generator based on the Shannon entropy to generate high-confidence OOD samples, constructing adaptive supervision information for OOD samples to minimize the disruption for learning to classify ID samples, exploring the data space around ID samples to construct the vicinity distributions for OOD samples, and utilizing an auxiliary network to explore the discarded OOD-sensitive information in ID samples according to information bottleneck theory.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/171946/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2023 Zhilin Zhao
dc.rights	au.edu.uts.lib/cph
dc.title	Statistical Methods for Out-of-distribution Detection	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

For a network trained on in-distribution (ID) samples, test samples could be out-of-distribution (OOD) that are drawn from distributions different from that of ID samples. Accordingly, OOD detection aims to identify OOD samples in test phases. The main challenge lies in that a network could provide high-confidence predictions for OOD samples, which indicates that the network cannot distinguish ID and OOD samples. The main causes of the high-confidence issue include limited ID and unavailable OOD samples in training processes. One strategy to enhance the detection performance of a network is to make the outputs more sensitive to OOD samples, i.e., the network tends to provide high- and low-confidence predictions for ID and OOD samples, respectively. Improving the OOD sensitivity for a network requires to address a series of important problems and challenges: (1) Penalizing OOD samples with high-confidence predictions can improve the OOD sensitivity. Accordingly, how to generate specific OOD samples for a network? (2) If partial OOD samples are observed, how to involve them in the retraining process to balance the ID generalization and OOD detection? (3) If OOD samples are unavailable, how to fine-tune a network with augmented ID samples to improve the OOD sensitivity? (4) If modifying the network is not allowed, how to learn an auxiliary network to capture the OOD-sensitive information for the network? This thesis systematically studies how to effectively solve the aforementioned issues with experimental and theoretical support. Due to the significant difference between ID and OOD samples, it is essential to consider the data characteristics and data correlations that statistical methods can model. Accordingly, this thesis attempts to incorporate statistical methods into deep neural networks to improve the OOD sensitivity. Specifically, this thesis proposes four novel methods to address these issues. The main ideas include inferring an implicit generator based on the Shannon entropy to generate high-confidence OOD samples, constructing adaptive supervision information for OOD samples to minimize the disruption for learning to classify ID samples, exploring the data space around ID samples to construct the vicinity distributions for OOD samples, and utilizing an auxiliary network to explore the discarded OOD-sensitive information in ID samples according to information bottleneck theory.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171946