Appropriate Statistics for Determining Chance-Removed Interpractitioner Agreement

Popplewell, M; Reizes, J; Zaslawski, C

Appropriate Statistics for Determining Chance-Removed Interpractitioner Agreement

Popplewell, M Reizes, J Zaslawski, C

Permalink

Publication Type:: Journal Article
Citation:: Journal of Alternative and Complementary Medicine, 2019, 25 (11), pp. 1115 - 1120
Issue Date:: 2019-11-01

Closed Access

	Filename	Description	Size
	acm.2017.0297.pdf	Submitted Version	221.08 kB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Popplewell, M	en_US
dc.contributor.author	Reizes, J	en_US
dc.contributor.author	Zaslawski, C https://orcid.org/0000-0001-5618-5161	en_US
dc.date.issued	2019-11-01	en_US
dc.identifier.citation	Journal of Alternative and Complementary Medicine, 2019, 25 (11), pp. 1115 - 1120	en_US
dc.identifier.issn	1075-5535	en_US
dc.identifier.uri	http://hdl.handle.net/10453/137716
dc.description.abstract	© Copyright 2019, Mary Ann Liebert, Inc. Objectives: Fleiss' Kappa (FK) has been commonly, but incorrectly, employed as the "standard" for evaluating chance-removed inter-rater agreement with ordinal data. This practice may lead to misleading conclusions in inter-rater agreement research. An example is presented that demonstrates the conditions where FK produces inappropriate results, compared with Gwet's AC2, which is proposed as a more appropriate statistic. A novel format for recording a Chinese Medical (CM) diagnoses, called the Diagnostic System of Oriental Medicine (DSOM), was used to record and compare patient diagnostic data, which, unlike the contemporary CM diagnostic format, allows agreement by chance to be considered when evaluating patient data obtained with unrestricted diagnostic options available to diagnosticians. Design: Five CM practitioners diagnosed 42 subjects drawn from an open population. Subjects' diagnoses were recorded using the DSOM format. All the available data were initially used to evaluate agreement. Then, the subjects were sorted into three groups to demonstrate the effects of differing data marginality on the calculated chance-removed agreement. Outcome measures: Agreement between the practitioners for each subject was evaluated with linearly weighted simple agreement, FK and Gwet's AC2. Results and Conclusions: In all cases, overall agreement was much lower with FK than Gwet's AC2. Larger differences occurred when the data were more free marginal. Inter-rater agreement determined with FK statistics is unlikely to be correct unless it can be shown that the data from which agreement is determined are, in fact, fixed marginal. It follows that results obtained on agreement between practitioners with FK are probably incorrect. It is shown that inter-rater agreement evaluated with AC2 statistic is an appropriate measure when fixed marginal data are neither expected nor guaranteed. The AC2 statistic should be used as the standard statistical approach for determining agreement between practitioners.	en_US
dc.relation.ispartof	Journal of Alternative and Complementary Medicine	en_US
dc.relation.isbasedon	10.1089/acm.2017.0297	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	Complementary & Alternative Medicine	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Diagnosis, Differential	en_US
dc.subject.mesh	Observer Variation	en_US
dc.subject.mesh	Medicine, Chinese Traditional	en_US
dc.subject.mesh	Models, Statistical	en_US
dc.subject.mesh	Reproducibility of Results	en_US
dc.title	Appropriate Statistics for Determining Chance-Removed Interpractitioner Agreement	en_US
dc.type	Journal Article
utslib.citation.volume	11	en_US
utslib.citation.volume	25	en_US
utslib.for	1104 Complementary and Alternative Medicine	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Life Sciences
utslib.copyright.status	closed_access	*
pubs.issue	11	en_US
pubs.publication-status	Published	en_US
pubs.volume	25	en_US

Abstract:

© Copyright 2019, Mary Ann Liebert, Inc. Objectives: Fleiss' Kappa (FK) has been commonly, but incorrectly, employed as the "standard" for evaluating chance-removed inter-rater agreement with ordinal data. This practice may lead to misleading conclusions in inter-rater agreement research. An example is presented that demonstrates the conditions where FK produces inappropriate results, compared with Gwet's AC2, which is proposed as a more appropriate statistic. A novel format for recording a Chinese Medical (CM) diagnoses, called the Diagnostic System of Oriental Medicine (DSOM), was used to record and compare patient diagnostic data, which, unlike the contemporary CM diagnostic format, allows agreement by chance to be considered when evaluating patient data obtained with unrestricted diagnostic options available to diagnosticians. Design: Five CM practitioners diagnosed 42 subjects drawn from an open population. Subjects' diagnoses were recorded using the DSOM format. All the available data were initially used to evaluate agreement. Then, the subjects were sorted into three groups to demonstrate the effects of differing data marginality on the calculated chance-removed agreement. Outcome measures: Agreement between the practitioners for each subject was evaluated with linearly weighted simple agreement, FK and Gwet's AC2. Results and Conclusions: In all cases, overall agreement was much lower with FK than Gwet's AC2. Larger differences occurred when the data were more free marginal. Inter-rater agreement determined with FK statistics is unlikely to be correct unless it can be shown that the data from which agreement is determined are, in fact, fixed marginal. It follows that results obtained on agreement between practitioners with FK are probably incorrect. It is shown that inter-rater agreement evaluated with AC2 statistic is an appropriate measure when fixed marginal data are neither expected nor guaranteed. The AC2 statistic should be used as the standard statistical approach for determining agreement between practitioners.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/137716