Duplicate-Insensitive Order Statistics Computation over Data Streams

Publication Type:
Journal Article
IEEE Transactions On Knowledge And Data Engineering, 2010, 22 (4), pp. 493 - 507
Issue Date:
Full metadata record
Files in This Item:
Filename Description SizeFormat
2013005451OK.pdf1.87 MBAdobe PDF
Duplicates in data streams may often be observed by the projection on a subspace and/or multiple recordings of objects. Without the uniqueness assumption on observed data elements, many conventional aggregates computation problems need to be further investigated due to their duplication-sensitive nature. In this paper, we present novel, space-efficient, one-scan algorithms to continuously maintain duplicate-insensitive order sketches so that rank-based queries can be approximately processed with a relative rank error guarantee epsilon in the presence of data duplicates. Besides the space efficiency, the proposed algorithms are time-efficient and highly accurate. Moreover, our techniques may be immediately applied to the heavy hitter problem against distinct elements and to the existing fault-tolerant distributed communication techniques. A comprehensive performance study demonstrates that our algorithms can support real-time computation against high-speed data streams.
Please use this identifier to cite or link to this item: