Enabling fast prediction for ensemble models on data streams

Zhang, P; Li, J; Wang, P; Gao, BJ; Zhu, X; Guo, L

Enabling fast prediction for ensemble models on data streams

Zhang, P

Li, J Wang, P Gao, BJ Zhu, X Guo, L

Permalink

Publisher:: ACM
Publication Type:: Conference Proceeding
Citation:: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 177 - 185
Issue Date:: 2011-01

Closed Access

	Filename	Description	Size
	2013005164OK.pdf		3.63 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, P https://orcid.org/0000-0001-7973-2746	en_US
dc.contributor.author	Li, J	en_US
dc.contributor.author	Wang, P	en_US
dc.contributor.author	Gao, BJ	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Guo, L	en_US
dc.contributor.editor	NA	en_US
dc.date	2011-08-21	en_US
dc.date.issued	2011-01	en_US
dc.identifier.citation	Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 177 - 185	en_US
dc.identifier.isbn	978-1-4503-0813-7	en_US
dc.identifier.uri	http://hdl.handle.net/10453/28932
dc.description.abstract	Ensemble learning has become a common tool for data stream classification, being able to handle large volumes of stream data and concept drifting. Previous studies focus on building accurate prediction models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can automatically update themselves by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Experiments on both synthetic and real-world data streams demonstrate the performance of our approach.	en_US
dc.publisher	ACM	en_US
dc.relation.ispartof	Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining	en_US
dc.relation.ispartof	ACM SIGKDD international conference on Knowledge discovery and data mining	en_US
dc.relation.isbasedon	10.1145/2020408.2020442	en_US
dc.rights	© ACM 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 177 - 185 http://doi.acm.org/10.1145/2020408.2020442	en_US
dc.title	Enabling fast prediction for ensemble models on data streams	en_US
dc.type	Conference Proceeding
utslib.location	USA	en_US
utslib.location.activity	San Diego, USA	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	San Diego, USA	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.consider-herdc	false	en_US
pubs.place-of-publication	USA	en_US
pubs.start-date	2011-08-21	en_US

Abstract:

Ensemble learning has become a common tool for data stream classification, being able to handle large volumes of stream data and concept drifting. Previous studies focus on building accurate prediction models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can automatically update themselves by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Experiments on both synthetic and real-world data streams demonstrate the performance of our approach.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/28932