Counterfactuals and causability in explainable artificial intelligence: Theory, algorithms, and applications

Chou, YL; Moreira, C; Bruza, P; Ouyang, C; Jorge, J

Counterfactuals and causability in explainable artificial intelligence: Theory, algorithms, and applications

Chou, YL Moreira, C Bruza, P Ouyang, C Jorge, J

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Information Fusion, 2022, 81, pp. 59-83
Issue Date:: 2022-05-01

Closed Access

	Filename	Description	Size
	1-s2.0-S1566253521002281-main.pdf		3.77 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chou, YL
dc.contributor.author	Moreira, C
dc.contributor.author	Bruza, P
dc.contributor.author	Ouyang, C
dc.contributor.author	Jorge, J
dc.date.accessioned	2023-06-29T05:28:44Z
dc.date.available	2023-06-29T05:28:44Z
dc.date.issued	2022-05-01
dc.identifier.citation	Information Fusion, 2022, 81, pp. 59-83
dc.identifier.issn	1566-2535
dc.identifier.uri	http://hdl.handle.net/10453/171000
dc.description.abstract	Deep learning models have achieved high performance across different domains, such as medical decision-making, autonomous vehicles, decision support systems, among many others. However, despite this success, the inner mechanisms of these models are opaque because their internal representations are too complex for a human to understand. This opacity makes it hard to understand the how or the why of the predictions of deep learning models. There has been a growing interest in model-agnostic methods that make deep learning models more transparent and explainable to humans. Some researchers recently argued that for a machine to achieve human-level explainability, this machine needs to provide human causally understandable explanations, also known as causability. A specific class of algorithms that have the potential to provide causability are counterfactuals. This paper presents an in-depth systematic review of the diverse existing literature on counterfactuals and causability for explainable artificial intelligence (AI). We performed a Latent Dirichlet topic modelling analysis (LDA) under a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework to find the most relevant literature articles. This analysis yielded a novel taxonomy that considers the grounding theories of the surveyed algorithms, together with their underlying properties and applications to real-world data. Our research suggests that current model-agnostic counterfactual algorithms for explainable AI are not grounded on a causal theoretical formalism and, consequently, cannot promote causability to a human decision-maker. Furthermore, our findings suggest that the explanations derived from popular algorithms in the literature provide spurious correlations rather than cause/effects relationships, leading to sub-optimal, erroneous, or even biased explanations. Thus, this paper also advances the literature with new directions and challenges on promoting causability in model-agnostic approaches for explainable AI.
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Information Fusion
dc.relation.isbasedon	10.1016/j.inffus.2021.11.003
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Counterfactuals and causability in explainable artificial intelligence: Theory, algorithms, and applications
dc.type	Journal Article
utslib.citation.volume	81
utslib.for	0801 Artificial Intelligence and Image Processing
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Provost
utslib.copyright.status	closed_access	*
dc.date.updated	2023-06-29T05:28:43Z
pubs.publication-status	Published
pubs.volume	81

Abstract:

Deep learning models have achieved high performance across different domains, such as medical decision-making, autonomous vehicles, decision support systems, among many others. However, despite this success, the inner mechanisms of these models are opaque because their internal representations are too complex for a human to understand. This opacity makes it hard to understand the how or the why of the predictions of deep learning models. There has been a growing interest in model-agnostic methods that make deep learning models more transparent and explainable to humans. Some researchers recently argued that for a machine to achieve human-level explainability, this machine needs to provide human causally understandable explanations, also known as causability. A specific class of algorithms that have the potential to provide causability are counterfactuals. This paper presents an in-depth systematic review of the diverse existing literature on counterfactuals and causability for explainable artificial intelligence (AI). We performed a Latent Dirichlet topic modelling analysis (LDA) under a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework to find the most relevant literature articles. This analysis yielded a novel taxonomy that considers the grounding theories of the surveyed algorithms, together with their underlying properties and applications to real-world data. Our research suggests that current model-agnostic counterfactual algorithms for explainable AI are not grounded on a causal theoretical formalism and, consequently, cannot promote causability to a human decision-maker. Furthermore, our findings suggest that the explanations derived from popular algorithms in the literature provide spurious correlations rather than cause/effects relationships, leading to sub-optimal, erroneous, or even biased explanations. Thus, this paper also advances the literature with new directions and challenges on promoting causability in model-agnostic approaches for explainable AI.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171000