GBK-means clustering algorithm: An improvement to the K-means algorithm based on the bargaining game

Publication Type:
Journal Article
Knowledge-Based Systems, 2021, 213
Issue Date:
Filename Description Size
1-s2.0-S0950705120308017-main.pdfPublished version2.19 MB
Adobe PDF
Full metadata record
Due to its simplicity, versatility and the diversity of applications to which it can be applied, k-means is one of the well-known algorithms for clustering data. The foundation of this algorithm is based on the distance measure. However, the traditional k-means has some weaknesses that appear in some data sets related to real applications, the most important of which is to consider only the distance criterion for clustering. Various studies have been conducted to address each of these weaknesses to achieve a balance between quality and efficiency. In this paper, a novel k-means variant of the original algorithm is proposed. This approach leverages the power of bargaining game modelling in the k-means algorithm for clustering data. In this novel setting, cluster centres compete with each other to attract the largest number of similar objectives or entities to their cluster. Thus, the centres keep changing their positions so that they have smaller distances with the maximum possible data than other cluster centres. We name this new algorithm the game-based k-means (GBK-means) algorithm. To show the superiority and efficiency of GBK-means over conventional clustering algorithms, namely, k-means and fuzzy k-means, we use the following syntactic and real-world data sets: (1) a series of two-dimensional syntactic data sets; and (2) ten benchmark data sets that are widely used in different clustering studies. The evaluation criteria show GBK-means is able to cluster data more accurately than classical algorithms based on eight evaluation metrics, namely F-measure, the Dunn index (DI), the rand index (RI), the Jaccard index (JI), normalized mutual information (NMI), normalized variation of information (NVI), the measure of concordance and error rate (ER).
Please use this identifier to cite or link to this item: