Hot topic extraction and public opinion classification of tibetan texts

Publication Type:
Journal Article
Journal of Digital Information Management, 2016, 14 (3), pp. 160 - 167
Issue Date:
Full metadata record
The increasing amount of Tibetan information has made Tibetan text processing popular and highly significant. In this study, Tibetan hot topic extraction and public opinion classification were investigated to accelerate the development of Tibetan information processing. First, Tibetan word segmentation in Tibetan hot topic extraction was presented. Second, feature selection based on term frequency and that based on document frequency was adopted to decrease feature dimensions. Third, a vector space model was used to conduct text representation. Finally, a statistical-based method was utilized to extract hot topics. In studying public opinion classification, a keyword table of public opinion needed to be established to conduct Tibetan public opinion classification. According to field, 18 classes were selected and used for public opinion classification. A keyword table of public opinion was constructed by domain experts. The approach to public opinion classification was introduced on the basis of the proposed similarity computation method. Depending on the proposed approaches, the application system was developed and used to carry out the experiments. Experiments show that the proposed method can extract topics effectively and classify public opinion rapidly. This research is helpful and meaningful for text classification, information retrieval, and construction of high-quality corpus.
Please use this identifier to cite or link to this item: