An aggregate ensemble for mining concept drifting data streams with noise

Zhang, P; Zhu, X; Shi, Y; Wu, X

An aggregate ensemble for mining concept drifting data streams with noise

Zhang, P

Zhu, X Shi, Y Wu, X

Permalink

Publication Type:: Journal Article
Citation:: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, 5476 LNAI pp. 1021 - 1029
Issue Date:: 2009-07-23

Closed Access

	Filename	Description	Size
	2013005162OK.pdf		445.53 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, P https://orcid.org/0000-0001-7973-2746	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Shi, Y	en_US
dc.contributor.author	Wu, X	en_US
dc.date.issued	2009-07-23	en_US
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, 5476 LNAI pp. 1021 - 1029	en_US
dc.identifier.isbn	9783642013065	en_US
dc.identifier.issn	0302-9743	en_US
dc.identifier.uri	http://hdl.handle.net/10453/28911
dc.description.abstract	Recent years have witnessed a large body of research work on mining concept drifting data streams, where a primary assumption is that the up-to-date data chunk and the yet-to-come data chunk share identical distributions, so classifiers with good performance on the up-to-date chunk would also have a good prediction accuracy on the yet-to-come data chunk. This "stationary assumption", however, does not capture the concept drifting reality in data streams. More recently, a "learnable assumption" has been proposed and allows the distribution of each data chunk to evolve randomly. Although this assumption is capable of describing the concept drifting in data streams, it is still inadequate to represent realworld data streams which usually suffer from noisy data as well as the drifting concepts. In this paper, we propose a Realistic Assumption which asserts that the difficulties of mining data streams are mainly caused by both concept drifting and noisy data chunks. Consequently, we present a new Aggregate Ensemble (AE) framework, which trains base classifiers using different learning algorithms on different data chunks. All the base classifiers are then combined to form a classifier ensemble through model averaging. Experimental results on synthetic and real-world data show that AE is superior to other ensemble methods under our new realistic assumption for noisy data streams. © Springer-Verlag Berlin Heidelberg 2009.	en_US
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.relation.isbasedon	10.1007/978-3-642-01307-2_109	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	An aggregate ensemble for mining concept drifting data streams with noise	en_US
dc.type	Journal Article
utslib.citation.volume	5476 LNAI	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	5476 LNAI	en_US

Abstract:

Recent years have witnessed a large body of research work on mining concept drifting data streams, where a primary assumption is that the up-to-date data chunk and the yet-to-come data chunk share identical distributions, so classifiers with good performance on the up-to-date chunk would also have a good prediction accuracy on the yet-to-come data chunk. This "stationary assumption", however, does not capture the concept drifting reality in data streams. More recently, a "learnable assumption" has been proposed and allows the distribution of each data chunk to evolve randomly. Although this assumption is capable of describing the concept drifting in data streams, it is still inadequate to represent realworld data streams which usually suffer from noisy data as well as the drifting concepts. In this paper, we propose a Realistic Assumption which asserts that the difficulties of mining data streams are mainly caused by both concept drifting and noisy data chunks. Consequently, we present a new Aggregate Ensemble (AE) framework, which trains base classifiers using different learning algorithms on different data chunks. All the base classifiers are then combined to form a classifier ensemble through model averaging. Experimental results on synthetic and real-world data show that AE is superior to other ensemble methods under our new realistic assumption for noisy data streams. © Springer-Verlag Berlin Heidelberg 2009.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/28911