Feature selection of imbalanced gene expression microarray data

Anaissi, A; Kennedy, PJ; Goyal, M

Feature selection of imbalanced gene expression microarray data

Anaissi, A Kennedy, PJ

Goyal, M

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - 2011 12th ACIS International Conference on Software Engineering, Artificial Intelligence Networking and Parallel Distributed Computing, SNPD 2011, 2011, pp. 73 - 78
Issue Date:: 2011-11-21

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (220.01 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Anaissi, A	en_US
dc.contributor.author	Kennedy, PJ https://orcid.org/0000-0001-7837-3171	en_US
dc.contributor.author	Goyal, M https://orcid.org/0000-0003-2853-9393	en_US
dc.date.issued	2011-11-21	en_US
dc.identifier.citation	Proceedings - 2011 12th ACIS International Conference on Software Engineering, Artificial Intelligence Networking and Parallel Distributed Computing, SNPD 2011, 2011, pp. 73 - 78	en_US
dc.identifier.isbn	9780769544755	en_US
dc.identifier.uri	http://hdl.handle.net/10453/19112
dc.description.abstract	Gene expression data is a very complex data set characterised by abundant numbers of features but with a low number of observations. However, only a small number of these features are relevant to an outcome of interest. With this kind of data set, feature selection becomes a real prerequisite. This paper proposes a methodology for feature selection for an imbalanced leukaemia gene expression data based on random forest algorithm. It presents the importance of feature selection in terms of reducing the number of features, enhancing the quality of machine learning and providing better understanding for biologists in diagnosis and prediction. Algorithms are presented to show the methodology and strategy for feature selection taking care to avoid over fitting. Moreover, experiments are done using imbalanced Leukaemia gene expression data and special measurement is used to evaluate the quality of feature selection and performance of classification. © 2011 IEEE.	en_US
dc.relation.ispartof	Proceedings - 2011 12th ACIS International Conference on Software Engineering, Artificial Intelligence Networking and Parallel Distributed Computing, SNPD 2011	en_US
dc.relation.isbasedon	10.1109/SNPD.2011.12	en_US
dc.title	Feature selection of imbalanced gene expression microarray data	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Sydney	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US

Abstract:

Gene expression data is a very complex data set characterised by abundant numbers of features but with a low number of observations. However, only a small number of these features are relevant to an outcome of interest. With this kind of data set, feature selection becomes a real prerequisite. This paper proposes a methodology for feature selection for an imbalanced leukaemia gene expression data based on random forest algorithm. It presents the importance of feature selection in terms of reducing the number of features, enhancing the quality of machine learning and providing better understanding for biologists in diagnosis and prediction. Algorithms are presented to show the methodology and strategy for feature selection taking care to avoid over fitting. Moreover, experiments are done using imbalanced Leukaemia gene expression data and special measurement is used to evaluate the quality of feature selection and performance of classification. © 2011 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/19112