Duplicate-insensitive order statistics computation over data streams

Zhang, Y; Lin, X; Yuan, Y; Kitsuregawa, M; Zhou, X; Yu, JW

Duplicate-insensitive order statistics computation over data streams

Zhang, Y

Lin, X Yuan, Y Kitsuregawa, M Zhou, X Yu, JW

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Knowledge and Data Engineering, 2010, 22 (4), pp. 493 - 507
Issue Date:: 2010-04-01

Closed Access

	Filename	Description	Size
	2013005451OK.pdf		1.87 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Y https://orcid.org/0000-0002-2674-1638	en_US
dc.contributor.author	Lin, X	en_US
dc.contributor.author	Yuan, Y	en_US
dc.contributor.author	Kitsuregawa, M	en_US
dc.contributor.author	Zhou, X	en_US
dc.contributor.author	Yu, JW https://orcid.org/0000-0002-9738-827X	en_US
dc.date.issued	2010-04-01	en_US
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2010, 22 (4), pp. 493 - 507	en_US
dc.identifier.issn	1041-4347	en_US
dc.identifier.uri	http://hdl.handle.net/10453/28921
dc.description.abstract	Duplicates in data streams may often be observed by the projection on a subspace and/or multiple recordings of objects. Without the uniqueness assumption on observed data elements, many conventional aggregates computation problems need to be further investigated due to their duplication-sensitive nature. In this paper, we present novel, space-efficient, one-scan algorithms to continuously maintain duplicate-insensitive order sketches so that rank-based queries can be approximately processed with a relative rank error guarantee \epsilon in the presence of data duplicates. Besides the space efficiency, the proposed algorithms are time-efficient and highly accurate. Moreover, our techniques may be immediately applied to the heavy hitter problem against distinct elements and to the existing fault-tolerant distributed communication techniques. A comprehensive performance study demonstrates that our algorithms can support real-time computation against high-speed data streams. © 2010 IEEE.	en_US
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_US
dc.relation.isbasedon	10.1109/TKDE.2009.68	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Duplicate-insensitive order statistics computation over data streams	en_US
dc.type	Journal Article
utslib.citation.volume	4	en_US
utslib.citation.volume	22	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	4	en_US
pubs.publication-status	Published	en_US
pubs.volume	22	en_US

Abstract:

Duplicates in data streams may often be observed by the projection on a subspace and/or multiple recordings of objects. Without the uniqueness assumption on observed data elements, many conventional aggregates computation problems need to be further investigated due to their duplication-sensitive nature. In this paper, we present novel, space-efficient, one-scan algorithms to continuously maintain duplicate-insensitive order sketches so that rank-based queries can be approximately processed with a relative rank error guarantee \epsilon in the presence of data duplicates. Besides the space efficiency, the proposed algorithms are time-efficient and highly accurate. Moreover, our techniques may be immediately applied to the heavy hitter problem against distinct elements and to the existing fault-tolerant distributed communication techniques. A comprehensive performance study demonstrates that our algorithms can support real-time computation against high-speed data streams. © 2010 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/28921