Efficient Sensitivity Analysis for Inequality Queries in Probabilistic Databases

Publication Type:
Journal Article
IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (1), pp. 86 - 99
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
07576647.pdfPublished Version654.15 kB
Adobe PDF
© 1989-2012 IEEE. In this paper, we study inequality query (IQ query) processing in tuple independent probabilistic databases, where IQ queries can be categorized into IQ-path, IQ-tree, and IQ-graph queries. We focus on two related issues for IQ queries. One issue is to efficiently compute their probabilities, with the observation that the time complexity of the state-of-the-art algorithm to process IQ-graph queries is high. The other issue is to efficiently perform their sensitivity analysis, which has not been studied before. Here, sensitivity analysis is to identify input tuples that have high influence on the probability of an answer tuple, and the influence of an input tuple is defined as the difference between the output probabilities obtained in two cases, where we assume that the tuple exists in one case and does not exist in the other one. In this paper, we compile the inequality conditions of an IQ query q into a compilation tree T, which encodes the Shannon expansion order. Moreover, we split q into a set of subqueries and each contains only one inequality condition. Using compilation tree and decomposition, we introduce a dynamic programming algorithm called Dec to process an IQ query q in time O(IΦI), where Φ is the lineage of q. An IQ query can be processed by our Dec if and only if its inequality conditions can be compiled into a compilation tree T and the inequality conditions from any node to all of its child nodes must be the same in T. We conduct extensive experiments using real and synthetic datasets to demonstrate the efficiency of our algorithm for computing the probabilities and influences of IQ queries.
Please use this identifier to cite or link to this item: