Positive-unlabeled learning in bioinformatics and computational biology: a brief review.

Li, F; Dong, S; Leier, A; Han, M; Guo, X; Xu, J; Wang, X; Pan, S; Jia, C; Zhang, Y; Webb, GI; Coin, LJM; Li, C; Song, J

Positive-unlabeled learning in bioinformatics and computational biology: a brief review.

Li, F Dong, S Leier, A Han, M Guo, X Xu, J Wang, X Pan, S

Jia, C Zhang, Y Webb, GI Coin, LJM Li, C Song, J

Permalink

Publisher:: Oxford University Press (OUP)
Publication Type:: Journal Article
Citation:: Brief Bioinform, 2022, 23, (1), pp. bbab461
Issue Date:: 2022-01-17

Closed Access

	Filename	Description	Size
	Positive-unlabeled learning in bioinformatics and computational biology A brief review.pdf	Published version	603.19 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, F
dc.contributor.author	Dong, S
dc.contributor.author	Leier, A
dc.contributor.author	Han, M
dc.contributor.author	Guo, X
dc.contributor.author	Xu, J
dc.contributor.author	Wang, X
dc.contributor.author	Pan, S https://orcid.org/0000-0003-0794-527X
dc.contributor.author	Jia, C
dc.contributor.author	Zhang, Y
dc.contributor.author	Webb, GI
dc.contributor.author	Coin, LJM
dc.contributor.author	Li, C
dc.contributor.author	Song, J
dc.date.accessioned	2023-04-18T04:40:46Z
dc.date.available	2021-10-07
dc.date.available	2023-04-18T04:40:46Z
dc.date.issued	2022-01-17
dc.identifier.citation	Brief Bioinform, 2022, 23, (1), pp. bbab461
dc.identifier.issn	1467-5463
dc.identifier.issn	1477-4054
dc.identifier.uri	http://hdl.handle.net/10453/169952
dc.description.abstract	Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
dc.format	Print
dc.language	eng
dc.publisher	Oxford University Press (OUP)
dc.relation.ispartof	Brief Bioinform
dc.relation.isbasedon	10.1093/bib/bbab461
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0601 Biochemistry and Cell Biology, 0802 Computation Theory and Mathematics, 0899 Other Information and Computing Sciences
dc.subject.classification	Bioinformatics
dc.subject.mesh	Algorithms
dc.subject.mesh	Computational Biology
dc.subject.mesh	Supervised Machine Learning
dc.subject.mesh	Computational Biology
dc.subject.mesh	Algorithms
dc.subject.mesh	Supervised Machine Learning
dc.subject.mesh	Algorithms
dc.subject.mesh	Computational Biology
dc.subject.mesh	Supervised Machine Learning
dc.title	Positive-unlabeled learning in bioinformatics and computational biology: a brief review.
dc.type	Journal Article
utslib.citation.volume	23
utslib.location.activity	England
utslib.for	0601 Biochemistry and Cell Biology
utslib.for	0802 Computation Theory and Mathematics
utslib.for	0899 Other Information and Computing Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access	*
dc.date.updated	2023-04-18T04:40:45Z
pubs.issue	1
pubs.publication-status	Published
pubs.volume	23
utslib.citation.issue	1

Abstract:

Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/169952