Probabilistic Ranking over Relations

Publisher:
ACM
Publication Type:
Conference Proceeding
Citation:
Proceedings of the 13th International Conference on Extending Database Technology, 2010, pp. 477 - 488
Issue Date:
2010-01
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2013002381OK.pdf662.42 kB
Adobe PDF
Probabilistic top-k ranking queries have been extensively studied due to the fact that data obtained can be uncertain in many real applications. A probabilistic top-k ranking query ranks objects by the interplay of score and probability, with an implicit assumption that both scores based on which objects are ranked and probabilities of the existence of the objects are stored in the same relation. We observe that in general scores and probabilities are highly possible to be stored in different relations, for example, in column-oriented DBMSs and in data warehouses. In this paper we study probabilistic top-k ranking queries when scores and probabilities are stored in different relations. We focus on reducing the join cost in probabilistic top-k ranking. We investigate two probabilistic score functions, discuss the upper/lower bounds in random access and sequential access, and provide insights on the advantages and disadvantages of random/sequential access in terms of upper/lower bounds. We also propose random, sequential, and hybrid algorithms to conduct probabilistic top-k ranking. We conducted extensive performance studies using real and synthetic datasets, and report our findings in this paper.
Please use this identifier to cite or link to this item: