Large-Scale Fuzzy Least Squares Twin SVMs for Class Imbalance Learning

Publisher:
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:
Journal Article
Citation:
IEEE Transactions on Fuzzy Systems, 2022, 30, (11), pp. 4815-4827
Issue Date:
2022-11-01
Filename Description Size
Large-Scale_Fuzzy_Least_Squares_Twin_SVMs_for_Class_Imbalance_Learning.pdfPublished version1.9 MB
Adobe PDF
Full metadata record
Twin support vector machines (TSVMs) have been successfully employed for binary classification problems. With the advent of machine learning algorithms, data have proliferated and there is a need to handle or process large-scale data. TSVMs are not successful in handling large-scale data due to the following: 1) the optimization problem solved in the TSVM needs to calculate large matrix inverses, which makes it an ineffective choice for large-scale problems; 2) the empirical risk minimization principle is employed in the TSVM and, hence, may suffer due to overfitting; and 3) the Wolfe dual of TSVM formulation involves positive-semidefinite matrices, and hence, singularity issues need to be resolved manually. Keeping in view the aforementioned shortcomings, in this article, we propose a novel large-scale fuzzy least squares TSVM for class imbalance learning (LS-FLSTSVM-CIL). We formulate the LS-FLSTSVM-CIL such that the proposed optimization problem ensures that: 1) no matrix inversion is involved in the proposed LS-FLSTSVM-CIL formulation, which makes it an efficient choice for large-scale problems; 2) the structural risk minimization principle is implemented, which avoids the issues of overfitting and results in better performance; and 3) the Wolfe dual formulation of the proposed LS-FLSTSVM-CIL model involves positive-definite matrices. In addition, to resolve the issues of class imbalance, we assign fuzzy weights in the proposed LS-FLSTSVM-CIL to avoid bias in dominating the samples of class imbalance problems. To make it more feasible for large-scale problems, we use an iterative procedure known as the sequential minimization principle to solve the objective function of the proposed LS-FLSTSVM-CIL model. From the experimental results, one can see that the proposed LS-FLSTSVM-CIL demonstrates superior performance in comparison to baseline classifiers. To demonstrate the feasibility of the proposed LS-FLSTSVM-CIL on large-scale classification problems, we evaluate the classification models on the large-scale normally distributed clustered (NDC) dataset. To demonstrate the practical applications of the proposed LS-FLSTSVM-CIL model, we evaluate it for the diagnosis of Alzheimer's disease and breast cancer disease. Evaluation on NDC datasets shows that the proposed LS-FLSTSVM-CIL has feasibility in large-scale problems as it is fast in comparison to the baseline classifiers.
Please use this identifier to cite or link to this item: