Enabling fast prediction for ensemble models on data streams

Publication Type:
Conference Proceeding
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 177 - 185
Issue Date:
Filename Description Size
Thumbnail2013005164OK.pdf3.63 MB
Adobe PDF
Full metadata record
Ensemble learning has become a common tool for data stream classification, being able to handle large volumes of stream data and concept drifting. Previous studies focus on building accurate prediction models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can automatically update themselves by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Experiments on both synthetic and real-world data streams demonstrate the performance of our approach.
Please use this identifier to cite or link to this item: