Logical query optimization for Cloudera Impala system

Publisher:
Elsevier
Publication Type:
Journal Article
Citation:
Journal of Systems and Software, 2017, 125, pp. 35-46
Issue Date:
2017-03-01
Filename Description Size
1-s2.0-S0164121216302400-main.pdf1.33 MB
Adobe PDF
Full metadata record
Cloudera Impala, an analytic database system for Apache Hadoop, has a severe problem with query plan generation: the system can only generate query plans in left-deep tree form, which restricts the ability of parallel execution. In this paper, we present a logical query optimization scheme for Impala system. First, an improved McCHyp (MinCutConservative Hypergraph) logical query plan generation algorithm is proposed for Impala system. It can reduce the plan generation time by introducing a pruning strategy. Second, a new cost model that takes the characteristics of Impala system into account is proposed. Finally, Impala system is extended to support query plans in bushy tree form by integrating the plan generation algorithm. We evaluated our scheme using TPC-DS test suit. Experimental results show that the extended Impala system generally performs better than the original system, and the improved plan generation algorithm has less execution time than McCHyp. In addition, our cost model fits better for Impala system, which supports query plans in bushy tree form.
Please use this identifier to cite or link to this item: