Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning.

Gong, C; Shi, H; Liu, T; Zhang, C; Yang, J; Tao, D

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning.

Gong, C Shi, H Liu, T Zhang, C Yang, J Tao, D

Permalink

Publisher:: Institute of Electrical and Electronics Engineers
Publication Type:: Journal Article
Citation:: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43, (3), pp. 918-932
Issue Date:: 2021

Closed Access

	Filename	Description	Size
	Loss_Decomposition_and_Centroid_Estimation_for_Positive_and_Unlabeled_Learning.pdf	Published version	3.04 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Gong, C
dc.contributor.author	Shi, H
dc.contributor.author	Liu, T
dc.contributor.author	Zhang, C
dc.contributor.author	Yang, J
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449
dc.date.accessioned	2022-05-08T01:46:54Z
dc.date.available	2022-05-08T01:46:54Z
dc.date.issued	2021
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43, (3), pp. 918-932
dc.identifier.issn	0162-8828
dc.identifier.issn	1939-3539
dc.identifier.uri	http://hdl.handle.net/10453/157127
dc.description.abstract	This paper studies Positive and Unlabeled learning (PU learning), of which the target is to build a binary classifier where only positive data and unlabeled data are available for classifier training. To deal with the absence of negative training data, we first regard all unlabeled data as negative examples with false negative labels, and then convert PU learning into the risk minimization problem in the presence of such one-side label noise. Specifically, we propose a novel PU learning algorithm dubbed "Loss Decomposition and Centroid Estimation" (LDCE). By decomposing the loss function of corrupted negative examples into two parts, we show that only the second part is affected by the noisy labels. Thereby, we may estimate the centroid of corrupted negative set via an unbiased way to reduce the adverse impact of such label noise. Furthermore, we propose the "Kernelized LDCE" (KLDCE) by introducing the kernel trick, and show that KLDCE can be easily solved by combining Alternative Convex Search (ACS) and Sequential Minimal Optimization (SMO). Theoretically, we derive the generalization error bound which suggests that the generalization risk of our model converges to the empirical risk with the order of O(1/ √k + 1 /√{n-k} + 1/ √n) ( n and k are the amounts of training data and positive data correspondingly). Experimentally, we conduct intensive experiments on synthetic dataset, UCI benchmark datasets and real-world datasets, and the results demonstrate that our approaches (LDCE and KLDCE) achieve the top-level performance when compared with both classic and state-of-the-art PU learning methods.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation	http://purl.org/au-research/grants/arc/DP180103424
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence
dc.relation.isbasedon	10.1109/tpami.2019.2941684
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning.
dc.type	Journal Article
utslib.citation.volume	43
utslib.location.activity	United States
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0806 Information Systems
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access	*
pubs.consider-herdc	true
dc.date.updated	2022-05-08T01:46:27Z
pubs.issue	3
pubs.publication-status	Published
pubs.volume	43
utslib.citation.issue	3

Abstract:

This paper studies Positive and Unlabeled learning (PU learning), of which the target is to build a binary classifier where only positive data and unlabeled data are available for classifier training. To deal with the absence of negative training data, we first regard all unlabeled data as negative examples with false negative labels, and then convert PU learning into the risk minimization problem in the presence of such one-side label noise. Specifically, we propose a novel PU learning algorithm dubbed "Loss Decomposition and Centroid Estimation" (LDCE). By decomposing the loss function of corrupted negative examples into two parts, we show that only the second part is affected by the noisy labels. Thereby, we may estimate the centroid of corrupted negative set via an unbiased way to reduce the adverse impact of such label noise. Furthermore, we propose the "Kernelized LDCE" (KLDCE) by introducing the kernel trick, and show that KLDCE can be easily solved by combining Alternative Convex Search (ACS) and Sequential Minimal Optimization (SMO). Theoretically, we derive the generalization error bound which suggests that the generalization risk of our model converges to the empirical risk with the order of O(1/ √k + 1 /√{n-k} + 1/ √n) ( n and k are the amounts of training data and positive data correspondingly). Experimentally, we conduct intensive experiments on synthetic dataset, UCI benchmark datasets and real-world datasets, and the results demonstrate that our approaches (LDCE and KLDCE) achieve the top-level performance when compared with both classic and state-of-the-art PU learning methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/157127