Two wrongs make a right: Addressing underreporting in binary data from multiple sources

Cook, SJ; Blas, B; Carroll, RJ; Sinha, S

Two wrongs make a right: Addressing underreporting in binary data from multiple sources

Cook, SJ Blas, B Carroll, RJ

Sinha, S

Permalink

Publication Type:: Journal Article
Citation:: Political Analysis, 2017, 25 (2), pp. 223 - 240
Issue Date:: 2017-04-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (643.95 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Cook, SJ	en_US
dc.contributor.author	Blas, B	en_US
dc.contributor.author	Carroll, RJ https://orcid.org/0000-0002-5465-9682	en_US
dc.contributor.author	Sinha, S	en_US
dc.date.available	2020-05-25T19:07:05Z
dc.date.issued	2017-04-01	en_US
dc.identifier.citation	Political Analysis, 2017, 25 (2), pp. 223 - 240	en_US
dc.identifier.issn	1047-1987	en_US
dc.identifier.uri	http://hdl.handle.net/10453/126962
dc.description.abstract	© The Author(s) 2017. Media-based event data-i.e., data comprised from reporting by media outlets-are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreportedby these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly outperforms current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database.	en_US
dc.relation.ispartof	Political Analysis	en_US
dc.relation.isbasedon	10.1017/pan.2016.13	en_US
dc.subject.classification	Political Science & Public Administration	en_US
dc.title	Two wrongs make a right: Addressing underreporting in binary data from multiple sources	en_US
dc.type	Journal Article
utslib.citation.volume	2	en_US
utslib.citation.volume	25	en_US
utslib.for	0104 Statistics	en_US
utslib.for	1606 Political Science	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
utslib.copyright.status	open_access
pubs.issue	2	en_US
pubs.publication-status	Published	en_US
pubs.volume	25	en_US

Abstract:

© The Author(s) 2017. Media-based event data-i.e., data comprised from reporting by media outlets-are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreportedby these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly outperforms current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/126962