AutoWeka4MCPS-AVATAR: Accelerating automated machine learning pipeline composition and optimisation

Nguyen, T-D; Musial, K; Gabrys, B

AutoWeka4MCPS-AVATAR: Accelerating automated machine learning pipeline composition and optimisation

Nguyen, T-D Musial, K Gabrys, B

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Expert Systems with Applications, 2021, 185
Issue Date:: 2021-07-01

Closed Access

	Filename	Description	Size
	1-s2.0-S0957417421010368-main.pdf	Published version	1.47 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Nguyen, T-D
dc.contributor.author	Musial, K
dc.contributor.author	Gabrys, B https://orcid.org/0000-0002-0790-2846
dc.date.accessioned	2022-03-19T22:47:01Z
dc.date.available	2022-03-19T22:47:01Z
dc.date.issued	2021-07-01
dc.identifier.citation	Expert Systems with Applications, 2021, 185
dc.identifier.issn	0957-4174
dc.identifier.issn	1873-6793
dc.identifier.uri	http://hdl.handle.net/10453/155371
dc.description.abstract	Automated machine learning pipeline (ML) composition and optimisation aim at automating the process of finding the most promising ML pipelines within allocated resources (i.e., time, CPU and memory). Existing methods, such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods frequently require a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid in the first place, and attempting to execute them is a waste of time and resources. To address this issue, we propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR). The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets’ characteristics. This knowledge base is used for a simplified mapping from an original ML pipeline to a surrogate model which is a Petri net based pipeline. Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components and input/output simplified mappings. Evaluating this surrogate model is less resource-intensive than the execution of the original pipeline. As a result, the AVATAR enables the pipeline composition and optimisation methods to evaluate more pipelines by quickly rejecting invalid pipelines. We integrate the AVATAR into the sequential model-based algorithm configuration (SMAC). Our experiments show that when SMAC employs AVATAR, it finds better solutions than on its own. This is down to the fact that the AVATAR can evaluate more pipelines within the same time budget and allocated resources.
dc.language	English
dc.publisher	Elsevier
dc.relation.ispartof	Expert Systems with Applications
dc.relation.isbasedon	10.1016/j.eswa.2021.115643
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	01 Mathematical Sciences, 08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	AutoWeka4MCPS-AVATAR: Accelerating automated machine learning pipeline composition and optimisation
dc.type	Journal Article
utslib.citation.volume	185
utslib.for	01 Mathematical Sciences
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Centre for Health Technologies (CHT)
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2022-03-19T22:46:59Z
pubs.publication-status	Published
pubs.volume	185

Abstract:

Automated machine learning pipeline (ML) composition and optimisation aim at automating the process of finding the most promising ML pipelines within allocated resources (i.e., time, CPU and memory). Existing methods, such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods frequently require a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid in the first place, and attempting to execute them is a waste of time and resources. To address this issue, we propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR). The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets’ characteristics. This knowledge base is used for a simplified mapping from an original ML pipeline to a surrogate model which is a Petri net based pipeline. Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components and input/output simplified mappings. Evaluating this surrogate model is less resource-intensive than the execution of the original pipeline. As a result, the AVATAR enables the pipeline composition and optimisation methods to evaluate more pipelines by quickly rejecting invalid pipelines. We integrate the AVATAR into the sequential model-based algorithm configuration (SMAC). Our experiments show that when SMAC employs AVATAR, it finds better solutions than on its own. This is down to the fact that the AVATAR can evaluate more pipelines within the same time budget and allocated resources.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/155371