A Pareto-smoothing method for causal inference using generalized Pareto distribution

Zhu, F; Lu, J; Lin, A; Zhang, G

A Pareto-smoothing method for causal inference using generalized Pareto distribution

Zhu, F

Lu, J

Lin, A Zhang, G

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Neurocomputing, 2020, 378, pp. 142-152
Issue Date:: 2020-02-22

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 23 Oct 2021

Adobe PDF

Download Uncorrected Proof VersionAdobe PDF (1.21 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhu, F https://orcid.org/0000-0001-8089-4769
dc.contributor.author	Lu, J https://orcid.org/0000-0003-0690-4732
dc.contributor.author	Lin, A
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-3960-0583
dc.date.accessioned	2020-10-19T19:11:22Z
dc.date.available	2020-10-19T19:11:22Z
dc.date.issued	2020-02-22
dc.identifier.citation	Neurocomputing, 2020, 378, pp. 142-152
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/143376
dc.description.abstract	© 2019 Elsevier B.V. Causal inference aims to estimate the treatment effect of an intervention on the target outcome variable and has received great attention across fields ranging from economics and statistics to machine learning. Observational causal inference is challenging because the pre-treatment variables may influence both the treatment and the outcome, resulting in confounding bias. The classic inverse propensity weighting (IPW) estimator is theoretically able to eliminate the confounding bias. However, in observational studies, the propensity scores used in the IPW estimator must be estimated from finite observational data and may be subject to extreme values, leading to the problem of highly variable importance weights, which consequently makes the estimated causal effect unstable or even misleading. In this paper, by reframing the IPW estimator in the importance sampling framework, we propose a Pareto-smoothing method to tackle this problem. The generalized Pareto distribution (GPD) from extreme value theory is used to fit the upper tail of the estimated importance weights and to replace them using the order statistics of the fitted GPD. To validate the performance of the new method, we conducted extensive experiments on simulated and semi-simulated datasets. Compared with two existing methods for importance weight stabilization, i.e., weight truncation and self-normalization, the proposed method generally achieves better performance in settings with a small sample size and high-dimensional covariates. Its application on a real-world heath dataset indicates its utility in estimating causal effects for program evaluation.
dc.language	English
dc.publisher	Elsevier
dc.relation	http://purl.org/au-research/grants/arc/DP170101632
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2019.09.095
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	A Pareto-smoothing method for causal inference using generalized Pareto distribution
dc.type	Journal Article
utslib.citation.volume	378
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
pubs.consider-herdc	true
utslib.copyright.embargo	2021-10-23T00:00:00+1000Z
dc.date.updated	2020-10-19T19:11:14Z
pubs.publication-status	In preparation
pubs.volume	378

Abstract:

© 2019 Elsevier B.V. Causal inference aims to estimate the treatment effect of an intervention on the target outcome variable and has received great attention across fields ranging from economics and statistics to machine learning. Observational causal inference is challenging because the pre-treatment variables may influence both the treatment and the outcome, resulting in confounding bias. The classic inverse propensity weighting (IPW) estimator is theoretically able to eliminate the confounding bias. However, in observational studies, the propensity scores used in the IPW estimator must be estimated from finite observational data and may be subject to extreme values, leading to the problem of highly variable importance weights, which consequently makes the estimated causal effect unstable or even misleading. In this paper, by reframing the IPW estimator in the importance sampling framework, we propose a Pareto-smoothing method to tackle this problem. The generalized Pareto distribution (GPD) from extreme value theory is used to fit the upper tail of the estimated importance weights and to replace them using the order statistics of the fitted GPD. To validate the performance of the new method, we conducted extensive experiments on simulated and semi-simulated datasets. Compared with two existing methods for importance weight stabilization, i.e., weight truncation and self-normalization, the proposed method generally achieves better performance in settings with a small sample size and high-dimensional covariates. Its application on a real-world heath dataset indicates its utility in estimating causal effects for program evaluation.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/143376