Automatic Composition and Optimization of Multicomponent Predictive Systems with an Extended Auto-WEKA

Publication Type:
Journal Article
IEEE Transactions on Automation Science and Engineering, 2019, 16 (2), pp. 946 - 959
Issue Date:
Filename Description Size
08550732.pdfPublished Version3.68 MB
Adobe PDF
Full metadata record
© 2004-2012 IEEE. Composition and parameterization of multicomponent predictive systems (MCPSs) consisting of chains of data transformation steps are a challenging task. Auto-WEKA is a tool to automate the combined algorithm selection and hyperparameter (CASH) optimization problem. In this paper, we extend the CASH problem and Auto-WEKA to support the MCPS, including preprocessing steps for both classification and regression tasks. We define the optimization problem in which the search space consists of suitably parameterized Petri nets forming the sought MCPS solutions. In the experimental analysis, we focus on examining the impact of considerably extending the search space (from approximately 22000 to 812 billion possible combinations of methods and categorical hyperparameters). In a range of extensive experiments, three different optimization strategies are used to automatically compose MCPSs for 21 publicly available data sets. The diversity of the composed MCPSs found is an indication that fully and automatically exploiting different combinations of data cleaning and preprocessing techniques is possible and highly beneficial for different predictive models. We also present the results on seven data sets from real chemical production processes. Our findings can have a major impact on the development of high-quality predictive models as well as their maintenance and scalability aspects needed in modern applications and deployment scenarios. Note to Practitioners - The extension of Auto-WEKA to compose and optimize multicomponent predictive systems (MCPSs) developed as part of this paper is freely available on GitHub under GPL license, and we encourage practitioners to use it on a broad variety of classification and regression problems. The software can either be used as a blackbox - where search space is made of all possible WEKA filters, predictors, and metapredictors (e.g., ensembles) - or as an optimization tool on a subset of preselected machine learning methods. The application has a graphical user interface, but it can also run from command line and can be embedded in any project as a Java library. There are three main outputs once an Auto-WEKA run has finished: 1) the trained MCPS ready to make predictions on unseen data; 2) the WEKA configuration (i.e., parameterized components); and 3) the Petri net in a Petri Net Markup Language format that can be analyzed using any tool supporting this standard language. There are, however, some practical considerations affecting the quality of the results that must be taken into consideration, such as the CPU time budget or the search starting point. These are extensively discussed in this paper.
Please use this identifier to cite or link to this item: