A new factor for computing the relevance of a document to a query
- Publication Type:
- Conference Proceeding
- 2010 IEEE World Congress on Computational Intelligence, WCCI 2010, 2010
- Issue Date:
In this paper we propose a method for semantic text representation and term weighting. It is based on a semantic resource, WordNet, that provides meaning information and relations between the terms of a document. The heart of the proposed method is the way the concepts (terms) of documents are clustered and weighted. More precisely, we introduce two notions: the "centrality" of a term and its specificity. The centrality of a term is given by the number of terms of the document that are directly related to it in the same conceptual cluster. The "specificity" represents the depth of a concept in WordNet. These parameters are different from the usual term frequency "tf" and inverse term frequency "idf" used in classical information retrieval. This method is based on two steps: 1) matching document terms with concepts of "WordNet" in order to obtain the most appropriate ones 2) for each concept calculating its centrality using existing semantic "WordNet" relations, and its "specificity". The preliminary experiments undertaken on TREC collections show the effective interest of these parameters.
Please use this identifier to cite or link to this item: