Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy

Zhao, Z; Peng, H; Zhang, X; Zheng, Y; Chen, F; Fang, L; Li, J

Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy

Zhao, Z

Peng, H

Zhang, X

Zheng, Y Chen, F Fang, L Li, J

Permalink

Publication Type:: Journal Article
Citation:: BMC Medical Genomics, 2019, 12
Issue Date:: 2019-12-20

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Published VersionAdobe PDF (1.51 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, Z https://orcid.org/0000-0001-5544-4504	en_US
dc.contributor.author	Peng, H https://orcid.org/0000-0002-4379-8097	en_US
dc.contributor.author	Zhang, X https://orcid.org/0000-0002-3783-6560	en_US
dc.contributor.author	Zheng, Y	en_US
dc.contributor.author	Chen, F	en_US
dc.contributor.author	Fang, L	en_US
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.date.accessioned	2020-04-06T06:26:09Z
dc.date.available	2019-11-18	en_US
dc.date.available	2020-04-06T06:26:09Z
dc.date.issued	2019-12-20	en_US
dc.identifier.citation	BMC Medical Genomics, 2019, 12	en_US
dc.identifier.uri	http://hdl.handle.net/10453/139844
dc.description.abstract	© 2019 The Author(s). Background: The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups. Methods: This paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method. Results: Compared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported. Conclusion: The proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries.	en_US
dc.relation.ispartof	BMC Medical Genomics	en_US
dc.relation.isbasedon	10.1186/s12920-019-0630-4	en_US
dc.subject.classification	Genetics & Heredity	en_US
dc.title	Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy	en_US
dc.type	Journal Article
utslib.citation.volume	12	en_US
utslib.for	0604 Genetics	en_US
utslib.for	1101 Medical Biochemistry and Metabolomics	en_US
utslib.for	1112 Oncology and Carcinogenesis	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	12	en_US

Abstract:

© 2019 The Author(s). Background: The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups. Methods: This paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method. Results: Compared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported. Conclusion: The proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/139844