Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network.

Baliarsingh, SK; Vipsita, S; Gandomi, AH; Panda, A; Bakshi, S; Ramasubbareddy, S

Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network.

Baliarsingh, SK Vipsita, S Gandomi, AH Panda, A Bakshi, S Ramasubbareddy, S

Permalink

Publisher:: ELSEVIER IRELAND LTD
Publication Type:: Journal Article
Citation:: Computer methods and programs in biomedicine, 2020, 195, pp. 105625
Issue Date:: 2020-10

Closed Access

	Filename	Description	Size
	1-s2.0-S0169260720314589-main.pdf	Published version	1.06 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Baliarsingh, SK
dc.contributor.author	Vipsita, S
dc.contributor.author	Gandomi, AH
dc.contributor.author	Panda, A
dc.contributor.author	Bakshi, S
dc.contributor.author	Ramasubbareddy, S
dc.date.accessioned	2020-10-31T20:17:45Z
dc.date.available	2020-06-19
dc.date.available	2020-10-31T20:17:45Z
dc.date.issued	2020-10
dc.identifier.citation	Computer methods and programs in biomedicine, 2020, 195, pp. 105625
dc.identifier.issn	0169-2607
dc.identifier.issn	1872-7565
dc.identifier.uri	http://hdl.handle.net/10453/143647
dc.description.abstract	BACKGROUND:The size of genomics data has been growing rapidly over the last decade. However, the conventional data analysis techniques are incapable of processing this huge amount of data. For the efficient processing of high dimensional datasets, it is essential to develop some new parallel methods. METHODS:In this work, a novel distributed method is presented using Map-Reduce (MR)-based approach. The proposed algorithm consists of MR-based Fisher score (mrFScore), MR-based ReliefF (mrRefiefF), and MR-based probabilistic neural network (mrPNN) using a weighted chaotic grey wolf optimization technique (WCGWO). Here, mrFScore, and mrRefiefF methods are introduced for feature selection (FS), and mrPNN is implemented as an effective method for microarray classification. The proper choice of smoothing parameter (σ) plays a major role in the prediction ability of the PNN which is addressed using a novel technique namely, WCGWO. The WCGWO algorithm is used to select the optimal value of σ in PNN. RESULTS:These algorithms have been successfully implemented using the Hadoop framework. The proposed model is tested by using three large and one small microarray datasets, and a comparative analysis is carried out with the existing FS and classification techniques. The results suggest that WCGWO-mrPNN can outperform other methods for high dimensional microarray classification. CONCLUSION:The effectiveness of the proposed methods are compared with other existing schemes. Experimental results reveal that the proposed scheme is accurate and robust. Hence, the suggested scheme is considered to be a reliable framework for microarray data analysis. SIGNIFICANCE:Such a method promotes the application of parallel programming using Hadoop cluster for the analysis of large-scale genomics data, particularly when the dataset is of high dimension.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	ELSEVIER IRELAND LTD
dc.relation.ispartof	Computer methods and programs in biomedicine
dc.relation.isbasedon	10.1016/j.cmpb.2020.105625
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0903 Biomedical Engineering, 0906 Electrical and Electronic Engineering
dc.subject.classification	Medical Informatics
dc.title	Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network.
dc.type	Journal Article
utslib.citation.volume	195
utslib.location.activity	Ireland
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0903 Biomedical Engineering
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/A/DRsch The Data Science Institute
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2020-10-31T20:17:29Z
pubs.publication-status	Published
pubs.volume	195

Abstract:

BACKGROUND:The size of genomics data has been growing rapidly over the last decade. However, the conventional data analysis techniques are incapable of processing this huge amount of data. For the efficient processing of high dimensional datasets, it is essential to develop some new parallel methods. METHODS:In this work, a novel distributed method is presented using Map-Reduce (MR)-based approach. The proposed algorithm consists of MR-based Fisher score (mrFScore), MR-based ReliefF (mrRefiefF), and MR-based probabilistic neural network (mrPNN) using a weighted chaotic grey wolf optimization technique (WCGWO). Here, mrFScore, and mrRefiefF methods are introduced for feature selection (FS), and mrPNN is implemented as an effective method for microarray classification. The proper choice of smoothing parameter (σ) plays a major role in the prediction ability of the PNN which is addressed using a novel technique namely, WCGWO. The WCGWO algorithm is used to select the optimal value of σ in PNN. RESULTS:These algorithms have been successfully implemented using the Hadoop framework. The proposed model is tested by using three large and one small microarray datasets, and a comparative analysis is carried out with the existing FS and classification techniques. The results suggest that WCGWO-mrPNN can outperform other methods for high dimensional microarray classification. CONCLUSION:The effectiveness of the proposed methods are compared with other existing schemes. Experimental results reveal that the proposed scheme is accurate and robust. Hence, the suggested scheme is considered to be a reliable framework for microarray data analysis. SIGNIFICANCE:Such a method promotes the application of parallel programming using Hadoop cluster for the analysis of large-scale genomics data, particularly when the dataset is of high dimension.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/143647