Characterizing Submanifold Region for Out-of-Distribution Detection

Li, X; Fang, Z; Zhang, Y; Ma, N; Bu, J; Han, B; Wang, H

Characterizing Submanifold Region for Out-of-Distribution Detection

Li, X Fang, Z

Zhang, Y Ma, N Bu, J Han, B Wang, H

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Knowledge and Data Engineering, 2024, 37, (1), pp. 130-147
Issue Date:: 2024-01-01

Closed Access

	Filename	Description	Size
	1757547.pdf	Published version	20.28 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, X
dc.contributor.author	Fang, Z https://orcid.org/0000-0003-0602-6255
dc.contributor.author	Zhang, Y
dc.contributor.author	Ma, N
dc.contributor.author	Bu, J
dc.contributor.author	Han, B
dc.contributor.author	Wang, H
dc.date.accessioned	2025-01-03T02:15:22Z
dc.date.available	2025-01-03T02:15:22Z
dc.date.issued	2024-01-01
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2024, 37, (1), pp. 130-147
dc.identifier.issn	1041-4347
dc.identifier.issn	1558-2191
dc.identifier.uri	http://hdl.handle.net/10453/182898
dc.description.abstract	Detecting out-of-distribution (OOD) samples poses a significant safety challenge when deploying models in open-world scenarios. Advanced works assume that OOD and in-distributional (ID) samples exhibit a distribution discrepancy, showing an encouraging direction in estimating the uncertainty with embedding features or predicting outputs. Besides incorporating auxiliary outlier as decision boundary, quantifying a 'meaningful distance' in embedding space as uncertainty measurement is a promising strategy. However, these distances-based approaches overlook the data structure and heavily rely on the high-dimension features learned by deep neural networks, causing unreliable distances due to the 'curse of dimensionality'. In this work, we propose a data structure-aware approach to mitigate the sensitivity of distances to the 'curse of dimensionality', where high-dimensional features are mapped to the manifold of ID samples, leveraging the well-known manifold assumption. Specifically, we present a novel distance termed as tangent distance, which tackles the issue of generalizing the meaningfulness of distances on testing samples to detect OOD inputs. Inspired by manifold learning for adversarial examples, where adversarial region probability density is close to the orthogonal direction of the manifold, and both OOD and adversarial samples have common characteristic-imperceptible perturbations with shift distribution, we propose that OOD samples are relatively far away from the ID manifold, where tangent distance directly computes the Euclidean distance between samples and the nearest submanifold space-instantiated as the linear approximation of local region on the manifold. We provide empirical and theoretical insights to demonstrate the effectiveness of OOD uncertainty measurements on the low-dimensional subspace. Extensive experiments show that the tangent distance performs competitively with other post hoc OOD detection baselines on common and large-scale benchmarks, and the theoretical analysis supports our claim that ID samples are likely to reside in high-density regions, explaining the effectiveness of internal connections among ID data.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering
dc.relation.isbasedon	10.1109/TKDE.2024.3468629
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences
dc.subject.classification	Information Systems
dc.subject.classification	46 Information and computing sciences
dc.title	Characterizing Submanifold Region for Out-of-Distribution Detection
dc.type	Journal Article
utslib.citation.volume	37
utslib.for	08 Information and Computing Sciences
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/UTS Groups
pubs.organisational-group	University of Technology Sydney/UTS Groups/Australian Artificial Intelligence Institute (AAII)
utslib.copyright.status	closed_access	*
dc.date.updated	2025-01-03T02:15:07Z
pubs.issue	1
pubs.publication-status	Published
pubs.volume	37
utslib.citation.issue	1

Abstract:

Detecting out-of-distribution (OOD) samples poses a significant safety challenge when deploying models in open-world scenarios. Advanced works assume that OOD and in-distributional (ID) samples exhibit a distribution discrepancy, showing an encouraging direction in estimating the uncertainty with embedding features or predicting outputs. Besides incorporating auxiliary outlier as decision boundary, quantifying a 'meaningful distance' in embedding space as uncertainty measurement is a promising strategy. However, these distances-based approaches overlook the data structure and heavily rely on the high-dimension features learned by deep neural networks, causing unreliable distances due to the 'curse of dimensionality'. In this work, we propose a data structure-aware approach to mitigate the sensitivity of distances to the 'curse of dimensionality', where high-dimensional features are mapped to the manifold of ID samples, leveraging the well-known manifold assumption. Specifically, we present a novel distance termed as tangent distance, which tackles the issue of generalizing the meaningfulness of distances on testing samples to detect OOD inputs. Inspired by manifold learning for adversarial examples, where adversarial region probability density is close to the orthogonal direction of the manifold, and both OOD and adversarial samples have common characteristic-imperceptible perturbations with shift distribution, we propose that OOD samples are relatively far away from the ID manifold, where tangent distance directly computes the Euclidean distance between samples and the nearest submanifold space-instantiated as the linear approximation of local region on the manifold. We provide empirical and theoretical insights to demonstrate the effectiveness of OOD uncertainty measurements on the low-dimensional subspace. Extensive experiments show that the tangent distance performs competitively with other post hoc OOD detection baselines on common and large-scale benchmarks, and the theoretical analysis supports our claim that ID samples are likely to reside in high-density regions, explaining the effectiveness of internal connections among ID data.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/182898