Ensemble pruning via individual contribution ordering

Publication Type:
Conference Proceeding
Citation:
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 871 - 880
Issue Date:
2010-09-07
Filename Description Size
Thumbnail2010001754OK.pdf1.16 MB
Adobe PDF
Full metadata record
An ensemble is a set of learned models that make decisions collectively. Although an ensemble is usually more accurate than a single learner, existing ensemble methods often tend to construct unnecessarily large ensembles, which increases the memory consumption and computational cost. Ensemble pruning tackles this problem by selecting a subset of ensemble members to form subensembles that are subject to less resource consumption and response time with accuracy that is similar to or better than the original ensemble. In this paper, we analyze the accuracy/diversity trade-off and prove that classifiers that are more accurate and make more predictions in the minority group are more important for subensemble construction. Based on the gained insights, a heuristic metric that considers both accuracy and diversity is proposed to explicitly evaluate each individual classifier's contribution to the whole ensemble. By incorporating ensemble members in decreasing order of their contributions, subensembles are formed such that users can select the top p percent of ensemble members, depending on their resource availability and tolerable waiting time, for predictions. Experimental results on 26 UCI data sets show that subensembles formed by the proposed EPIC (Ensemble Pruning via Individual Contribution ordering) algorithm outperform the original ensemble and a state-of-the-art ensemble pruning method, Orientation Ordering (OO) [16]. © 2010 ACM.
Please use this identifier to cite or link to this item: