An under-sampling method based on fuzzy logic for large imbalanced dataset

Wong, GY; Leung, FHF; Ling, SH

An under-sampling method based on fuzzy logic for large imbalanced dataset

Wong, GY Leung, FHF Ling, SH

Permalink

Publication Type:: Conference Proceeding
Citation:: IEEE International Conference on Fuzzy Systems, 2014, pp. 1248 - 1252
Issue Date:: 2014-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (349.28 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Wong, GY	en_US
dc.contributor.author	Leung, FHF	en_US
dc.contributor.author	Ling, SH https://orcid.org/0000-0003-0849-5098	en_US
dc.date.issued	2014-01-01	en_US
dc.identifier.citation	IEEE International Conference on Fuzzy Systems, 2014, pp. 1248 - 1252	en_US
dc.identifier.isbn	9781479920723	en_US
dc.identifier.issn	1098-7584	en_US
dc.identifier.uri	http://hdl.handle.net/10453/37516
dc.identifier.uri	http://hdl.handle.net/10453/33123
dc.description.abstract	© 2014 IEEE. Large imbalanced datasets have introduced difficulties to classification problems. They cause a high error rate of the minority class samples and a long training time of the classification model. Therefore, re-sampling and data size reduction have become important steps to pre-process the data. In this paper, a sampling strategy over a large imbalanced dataset is proposed, in which the samples of the larger class are selected based on fuzzy logic. To further reduce the data size, the evolutionary computational method of CHC is employed. The evaluation is done by applying a Support Vector Machine (SVM) to train a classification model from the re-sampled training sets. From experimental results, it can be seen that our proposed method improves both the F-measure and AUC. The complexity of the classification model is also compared. It is found that our proposed method is superior to all other compared methods.	en_US
dc.relation.ispartof	IEEE International Conference on Fuzzy Systems	en_US
dc.relation.isbasedon	10.1109/FUZZ-IEEE.2014.6891771	en_US
dc.title	An under-sampling method based on fuzzy logic for large imbalanced dataset	en_US
dc.type	Conference Proceeding
utslib.for	080108 Neural, Evolutionary and Fuzzy Computation	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US

Abstract:

© 2014 IEEE. Large imbalanced datasets have introduced difficulties to classification problems. They cause a high error rate of the minority class samples and a long training time of the classification model. Therefore, re-sampling and data size reduction have become important steps to pre-process the data. In this paper, a sampling strategy over a large imbalanced dataset is proposed, in which the samples of the larger class are selected based on fuzzy logic. To further reduce the data size, the evolutionary computational method of CHC is employed. The evaluation is done by applying a Support Vector Machine (SVM) to train a classification model from the re-sampled training sets. From experimental results, it can be seen that our proposed method improves both the F-measure and AUC. The complexity of the classification model is also compared. It is found that our proposed method is superior to all other compared methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/33123