Feature prioritisation on big genomic data for analysing gene-gene interactions

Aloqaily, AA; Tafavogh, S; Harvey, BL; Catchpoole, DR; Kennedy, PJ

Feature prioritisation on big genomic data for analysing gene-gene interactions

Aloqaily, AA Tafavogh, S Harvey, BL Catchpoole, DR Kennedy, PJ

Permalink

Publisher:: Inderscience Publishers
Publication Type:: Journal Article
Citation:: International Journal of Bioinformatics Research and Applications, 2021, 17, (2), pp. 158-177
Issue Date:: 2021-01-01

Closed Access

	Filename	Description	Size
	ijbra.2021.114420.pdf	Published version	1.65 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Aloqaily, AA
dc.contributor.author	Tafavogh, S
dc.contributor.author	Harvey, BL
dc.contributor.author	Catchpoole, DR
dc.contributor.author	Kennedy, PJ
dc.date.accessioned	2021-10-02T22:42:42Z
dc.date.available	2021-10-02T22:42:42Z
dc.date.issued	2021-01-01
dc.identifier.citation	International Journal of Bioinformatics Research and Applications, 2021, 17, (2), pp. 158-177
dc.identifier.issn	1744-5485
dc.identifier.issn	1744-5493
dc.identifier.uri	http://hdl.handle.net/10453/150804
dc.description.abstract	Complex diseases are not caused by single genes but result from intricate non-linear interactions among them. There is a critical need to implement approaches that take into account non-linear gene-gene interactions in searching for markers that jointly cause diseases. Determining the interaction between more than two single nucleotide polymorphisms (SNP) within the whole genome data is computationally expensive and often infeasible. In this paper, we develop an approach to classify patients with Acute Lymphoblastic Leukaemia by analysing multiple SNP interactions. A novel feature prioritisation algorithm called interaction effect quantity (IEQ) selects SNPs with high potential of interaction by analysing their distribution throughout the genomic data and enables deeper analysis of non-linear interactions within large datasets. We show that IEQ enables analyses of interactions between up to four SNPs, with F-measure for classification greater than 89% obtained. Such an analysis is typically much more computationally challenging if IEQ is not implemented.
dc.language	en
dc.publisher	Inderscience Publishers
dc.relation.ispartof	International Journal of Bioinformatics Research and Applications
dc.relation.isbasedon	10.1504/IJBRA.2021.114420
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	01 Mathematical Sciences, 06 Biological Sciences, 08 Information and Computing Sciences
dc.subject.classification	Bioinformatics
dc.title	Feature prioritisation on big genomic data for analysing gene-gene interactions
dc.type	Journal Article
utslib.citation.volume	17
utslib.for	01 Mathematical Sciences
utslib.for	06 Biological Sciences
utslib.for	08 Information and Computing Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Centre for Health Technologies (CHT)
utslib.copyright.status	closed_access	*
dc.date.updated	2021-10-02T22:42:40Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	17
utslib.citation.issue	2

Abstract:

Complex diseases are not caused by single genes but result from intricate non-linear interactions among them. There is a critical need to implement approaches that take into account non-linear gene-gene interactions in searching for markers that jointly cause diseases. Determining the interaction between more than two single nucleotide polymorphisms (SNP) within the whole genome data is computationally expensive and often infeasible. In this paper, we develop an approach to classify patients with Acute Lymphoblastic Leukaemia by analysing multiple SNP interactions. A novel feature prioritisation algorithm called interaction effect quantity (IEQ) selects SNPs with high potential of interaction by analysing their distribution throughout the genomic data and enables deeper analysis of non-linear interactions within large datasets. We show that IEQ enables analyses of interactions between up to four SNPs, with F-measure for classification greater than 89% obtained. Such an analysis is typically much more computationally challenging if IEQ is not implemented.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/150804