Extension of similarity measures in VSM: From orthogonal coordinate system to affine coordinate system

Publication Type:
Conference Proceeding
Proceedings of the International Joint Conference on Neural Networks, 2014, pp. 4084 - 4091
Issue Date:
Full metadata record
© 2014 IEEE. Similarity measures are the foundations of many research areas, e.g. information retrieval, recommender system and machine learning algorithms. Promoted by these application scenarios, a number of similarity measures have been proposed and proposing. In these state-of-the-art measures, vector-based representation is widely accepted based on Vector Space Model (VSM) in which an object is represented as a vector composed of its features. Then, the similarity between two objects is evaluated by the operations on two corresponding vectors, like cosine, extended jaccard, extended dice and so on. However, there is an assumption that the features are independent of each others. This assumption is apparently unrealistic, and normally, there are relations between features, i.e. the co-occurrence relations between keywords in text mining area. In this paper, a space geometry-based method is proposed to extend the VSM from the orthogonal coordinate system (OVSM) to affine coordinate system (AVSM) and OVSM is proved to be a special case of AVSM. Unit coordinate vectors of AVSM are inferred by the relations between features which are considered as angles between these unit coordinate vectors. At last, five different similarity measures are extended from OVSM to AVSM using unit coordinate vectors of AVSM. Within the numerous application fields of similarity measures, the task of text clustering is selected to be the evaluation criterion. Documents are represented as vectors in OVSM and AVSM, respectively. The clustering results show that AVSM outweighs the OVSM.
Please use this identifier to cite or link to this item: