Mining data streams with labeled and unlabeled training examples

Zhang, P; Zhu, X; Guo, L

Mining data streams with labeled and unlabeled training examples

Zhang, P

Zhu, X Guo, L

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - IEEE International Conference on Data Mining, ICDM, 2009, pp. 627 - 636
Issue Date:: 2009-12-01

Closed Access

	Filename	Description	Size
	2009001668OK.pdf		887.17 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, P https://orcid.org/0000-0001-7973-2746	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Guo, L	en_US
dc.date.issued	2009-12-01	en_US
dc.identifier.citation	Proceedings - IEEE International Conference on Data Mining, ICDM, 2009, pp. 627 - 636	en_US
dc.identifier.isbn	9780769538952	en_US
dc.identifier.issn	1550-4786	en_US
dc.identifier.uri	http://hdl.handle.net/10453/10773
dc.description.abstract	In this paper, we propose a framework to build prediction models from data streams which contain both labeled and unlabeled examples. We argue that due to the increasing data collection ability but limited resources for labeling, stream data collected at hand may only have a small number of labeled examples, whereas a large portion of data remain unlabeled but can be beneficial for learning. Unleashing the full potential of the unlabeled instances for stream data mining is, however, a significant challenge, consider that even fully labeled data streams may suffer from the concept drifting, and inappropriate uses of the unlabeled samples may only make the problem even worse. To build prediction models, we first categorize the stream data into four different categories, each of which corresponds to the situation where concept drifting may or may not exist in the labeled and unlabeled data. After that, we propose a relational k-means based transfer semi-supervised SVM learning framework (RK-TS3VM), which intends to leverage labeled and unlabeled samples to build prediction models. Experimental results and comparisons on both synthetic and real-world data streams demonstrate that the proposed framework is able to help build prediction models more accurate than other simple approaches can offer. © 2009 Crown Copyright.	en_US
dc.relation.ispartof	Proceedings - IEEE International Conference on Data Mining, ICDM	en_US
dc.relation.isbasedon	10.1109/ICDM.2009.76	en_US
dc.title	Mining data streams with labeled and unlabeled training examples	en_US
dc.type	Conference Proceeding
utslib.for	0806 Information Systems	en_US
dc.location.activity	Miami, Florida	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

In this paper, we propose a framework to build prediction models from data streams which contain both labeled and unlabeled examples. We argue that due to the increasing data collection ability but limited resources for labeling, stream data collected at hand may only have a small number of labeled examples, whereas a large portion of data remain unlabeled but can be beneficial for learning. Unleashing the full potential of the unlabeled instances for stream data mining is, however, a significant challenge, consider that even fully labeled data streams may suffer from the concept drifting, and inappropriate uses of the unlabeled samples may only make the problem even worse. To build prediction models, we first categorize the stream data into four different categories, each of which corresponds to the situation where concept drifting may or may not exist in the labeled and unlabeled data. After that, we propose a relational k-means based transfer semi-supervised SVM learning framework (RK-TS3VM), which intends to leverage labeled and unlabeled samples to build prediction models. Experimental results and comparisons on both synthetic and real-world data streams demonstrate that the proposed framework is able to help build prediction models more accurate than other simple approaches can offer. © 2009 Crown Copyright.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/10773