Choosing LSI dimensions by document linear association analysis

Publication Type:
Conference Proceeding
Proceedings of the International Conference on Information and Knowledge Engineering, 2003, 2 pp. 615 - 621
Issue Date:
Full metadata record
Latent Semantic Indexing (LSI) has proven to be a valuable analysis tool with a wide range of applications, however the crucial question, choosing an appropriate number of dimensions for LSI, is still unsolved. In this paper, a new method which is to deal with this problem is described. It finds that a sum of total dot products between all document vectors reaches the maximum value at a specific number of dimensions for a given dataset. With this reduced dimensions LSI achieves the best performance. The performance evaluations have demonstrated that this method can choose an appropriate number of dimensions for LSI and effective detect the data structure for a dataset.
Please use this identifier to cite or link to this item: