Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data

Li, Y; Li, J

Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data

Li, Y Li, J

Permalink

Publication Type:: Journal Article
Citation:: BMC Genomics, 2012, 13
Issue Date:: 2012-01-01

Closed Access

	Filename	Description	Size
	2012001032OK.pdf		527.25 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, Y	en_US
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.date.issued	2012-01-01	en_US
dc.identifier.citation	BMC Genomics, 2012, 13	en_US
dc.identifier.uri	http://hdl.handle.net/10453/23088
dc.description.abstract	© 2012 Li et al. Background: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. Results: In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. Conclusions: The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization.	en_US
dc.relation.ispartof	BMC Genomics	en_US
dc.relation.isbasedon	10.1186/1471-2164-13-S7-S27	en_US
dc.subject.classification	Bioinformatics	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Diabetes Mellitus, Type 1	en_US
dc.subject.mesh	ROC Curve	en_US
dc.subject.mesh	Protein Interaction Mapping	en_US
dc.subject.mesh	Genomics	en_US
dc.subject.mesh	Phenotype	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Databases, Factual	en_US
dc.title	Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data	en_US
dc.type	Journal Article
utslib.citation.volume	13	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	06 Biological Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	11 Medical and Health Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	13	en_US

Abstract:

© 2012 Li et al. Background: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. Results: In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. Conclusions: The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/23088