Classification models combined with Boruta feature selection for heart disease prediction

Manikandan, G; Pragadeesh, B; Manojkumar, V; Karthikeyan, AL; Manikandan, R; Gandomi, AH

Classification models combined with Boruta feature selection for heart disease prediction

Manikandan, G Pragadeesh, B Manojkumar, V Karthikeyan, AL Manikandan, R Gandomi, AH

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Informatics in Medicine Unlocked, 2024, 44, pp. 101442
Issue Date:: 2024-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Published versionAdobe PDF (6.12 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Manikandan, G
dc.contributor.author	Pragadeesh, B
dc.contributor.author	Manojkumar, V
dc.contributor.author	Karthikeyan, AL
dc.contributor.author	Manikandan, R
dc.contributor.author	Gandomi, AH
dc.date.accessioned	2025-03-20T02:57:21Z
dc.date.available	2025-03-20T02:57:21Z
dc.date.issued	2024-01-01
dc.identifier.citation	Informatics in Medicine Unlocked, 2024, 44, pp. 101442
dc.identifier.issn	2352-9148
dc.identifier.uri	http://hdl.handle.net/10453/186034
dc.description.abstract	Cardiovascular disease (CVD), generally called heart illness, is a collective term for various ailments that affect the heart and blood vessels. Heart disease is a primary cause of fatality and morbidity in people worldwide, resulting in 18 million deaths per year. By identifying those who are most vulnerable to heart diseases and ensuring they receive the appropriate care, premature demise can be prevented. Machine learning algorithms are now crucial in the medical field, especially when using medical databases to diagnose diseases. Such efficient algorithms and data processing techniques are applied to predict various diseases and offer much potential for accurate heart disease prognosis. Therefore, this study compares the performance logistic regression, decision tree, and support vector machine (SVM) methods with and without Boruta feature selection. The Cleveland Clinic Heart Disease Dataset acquired from Kaggle, which consists of 14 features and 303 instances, was used for the investigation. It was found that the Boruta feature selection algorithm, which selects six of the most relevant features, improved the results of the algorithms. Among these classification algorithms, logistic regression produced the most efficient result, with an accuracy of 88.52 %.
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Informatics in Medicine Unlocked
dc.relation.isbasedon	10.1016/j.imu.2023.101442
dc.rights	info:eu-repo/semantics/openAccess
dc.subject.classification	4203 Health services and systems
dc.title	Classification models combined with Boruta feature selection for heart disease prediction
dc.type	Journal Article
utslib.citation.volume	44
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/UTS Groups
pubs.organisational-group	University of Technology Sydney/UTS Groups/Data Science Institute (DSI)
utslib.copyright.status	open_access	*
dc.rights.license	This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
dc.date.updated	2025-03-20T02:57:19Z
pubs.publication-status	Published
pubs.volume	44

Abstract:

Cardiovascular disease (CVD), generally called heart illness, is a collective term for various ailments that affect the heart and blood vessels. Heart disease is a primary cause of fatality and morbidity in people worldwide, resulting in 18 million deaths per year. By identifying those who are most vulnerable to heart diseases and ensuring they receive the appropriate care, premature demise can be prevented. Machine learning algorithms are now crucial in the medical field, especially when using medical databases to diagnose diseases. Such efficient algorithms and data processing techniques are applied to predict various diseases and offer much potential for accurate heart disease prognosis. Therefore, this study compares the performance logistic regression, decision tree, and support vector machine (SVM) methods with and without Boruta feature selection. The Cleveland Clinic Heart Disease Dataset acquired from Kaggle, which consists of 14 features and 303 instances, was used for the investigation. It was found that the Boruta feature selection algorithm, which selects six of the most relevant features, improved the results of the algorithms. Among these classification algorithms, logistic regression produced the most efficient result, with an accuracy of 88.52 %.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/186034