Stacking Ensemble Model for Liver Stiffness Classification with Imbalanced Data

Wang, Mingjian

Stacking Ensemble Model for Liver Stiffness Classification with Imbalanced Data

Wang, Mingjian

Permalink

Publication Type:: Thesis
Issue Date:: 2021

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download contents and abstractAdobe PDF (118.08 kB)

Download thesisAdobe PDF (4.15 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, Mingjian
dc.date.accessioned	2021-05-21T01:20:18Z
dc.date.available	2021-05-21T01:20:18Z
dc.date.issued	2021
dc.identifier.uri	http://hdl.handle.net/10453/149025
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Liver cirrhosis is a significant threat to humans; once the liver reaches the last stage of cirrhosis, there is no cure for it. Therefore, discovering cirrhosis in the early stage is one of the effective ways to decrease the mortality rate of cirrhosis. Besides early detection, increasing the correct cirrhosis diagnosis rate is another desirable method to avoid late treatment for patients. This thesis developed an automatic diagnosis approach to predict doctors’ opinions for patients regarding the liver stiffness measurements from FibroScan tests. A model using the Stacking ensemble method was presented to build a classifier for an imbalanced liver stiffness measurement data-set. The data-set was collected from 13,418 Chinese patients who had liver cirrhosis tests by FibroScan. It recorded 30 sets of features, also provided professional doctors’ opinions in Chinese. To transfer the Chinese characters to digital, we applied Jieba module in Python which is a natural language processing method to create 6 labels in classification. Each label presents one doctors’ opinion. Since this data-set is highly imbalanced, sampling methods such as the under-sampling method and the oversampling method are applied to solve this problem. To identify the most suitable model for the classification, we performed a study of 7 supervised learning algorithms, Logistic Regression (LR), Decision Tree (DT), Naive Bayesian (NB), K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Random Forest (RF) and AdaBoost; also demonstrated the stacking models based on these supervised learning algorithms. The results demonstrated that the use of Synthetic Minority Oversampling Technique (SMOTE) oversampling technique was effective to handle the imbalanced liver data-set, and the best fitting model was constructed by using DT as meta-classifier with four base classifiers (KNN, RF, DT, SVM) in the stacking model.	en_US.UTF-8
dc.format	Thesis (ME)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/149025/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Stacking Ensemble Model for Liver Stiffness Classification with Imbalanced Data	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Liver cirrhosis is a significant threat to humans; once the liver reaches the last stage of cirrhosis, there is no cure for it. Therefore, discovering cirrhosis in the early stage is one of the effective ways to decrease the mortality rate of cirrhosis. Besides early detection, increasing the correct cirrhosis diagnosis rate is another desirable method to avoid late treatment for patients. This thesis developed an automatic diagnosis approach to predict doctors’ opinions for patients regarding the liver stiffness measurements from FibroScan tests. A model using the Stacking ensemble method was presented to build a classifier for an imbalanced liver stiffness measurement data-set. The data-set was collected from 13,418 Chinese patients who had liver cirrhosis tests by FibroScan. It recorded 30 sets of features, also provided professional doctors’ opinions in Chinese. To transfer the Chinese characters to digital, we applied Jieba module in Python which is a natural language processing method to create 6 labels in classification. Each label presents one doctors’ opinion. Since this data-set is highly imbalanced, sampling methods such as the under-sampling method and the oversampling method are applied to solve this problem. To identify the most suitable model for the classification, we performed a study of 7 supervised learning algorithms, Logistic Regression (LR), Decision Tree (DT), Naive Bayesian (NB), K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Random Forest (RF) and AdaBoost; also demonstrated the stacking models based on these supervised learning algorithms. The results demonstrated that the use of Synthetic Minority Oversampling Technique (SMOTE) oversampling technique was effective to handle the imbalanced liver data-set, and the best fitting model was constructed by using DT as meta-classifier with four base classifiers (KNN, RF, DT, SVM) in the stacking model.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/149025