Vote-Based LELC for Positive and Unlabeled Textual Data Streams

IEEE Computer Society Conference Publishing Services (CPS)
Publication Type:
Conference Proceeding
2010 IEEE International Conference on Data Mining Workshops (ICDMW), 2010, pp. 951 - 958
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2010001678OK.pdf670.93 kB
Adobe PDF
In this paper, we extend LELC (PU Learning by Extracting Likely Positive and Negative Micro-Clusters) method to cope with positive and unlabeled data streams. Our developed approach, which is called vote-based LELC, works in three steps. In the first step, we extract representative documents from unlabeled data and assign a vote score to each document. The assigned vote score reflects the degree of belongingness of an example towards its corresponding class. In the second step, the extracted representative examples, together with their vote scores, are incorporated into a learning phase to build an SVM-based classifier. In the third step, we propose the usage of an ensemble classifier to cope with concept drift involved in the textual data stream environment. Our developed approach aims at improving the performance of LELC by rendering examples to contribute differently to the construction of the classifier according to their vote scores. Extensive experiments on textual data streams have demonstrated that vote-based LELC outperforms the original LELC method.
Please use this identifier to cite or link to this item: