Appropriate Statistics for Determining Chance-Removed Interpractitioner Agreement

Publication Type:
Journal Article
Journal of Alternative and Complementary Medicine, 2019, 25 (11), pp. 1115 - 1120
Issue Date:
Filename Description Size
acm.2017.0297.pdfSubmitted Version221.08 kB
Adobe PDF
Full metadata record
© Copyright 2019, Mary Ann Liebert, Inc. Objectives: Fleiss' Kappa (FK) has been commonly, but incorrectly, employed as the "standard" for evaluating chance-removed inter-rater agreement with ordinal data. This practice may lead to misleading conclusions in inter-rater agreement research. An example is presented that demonstrates the conditions where FK produces inappropriate results, compared with Gwet's AC2, which is proposed as a more appropriate statistic. A novel format for recording a Chinese Medical (CM) diagnoses, called the Diagnostic System of Oriental Medicine (DSOM), was used to record and compare patient diagnostic data, which, unlike the contemporary CM diagnostic format, allows agreement by chance to be considered when evaluating patient data obtained with unrestricted diagnostic options available to diagnosticians. Design: Five CM practitioners diagnosed 42 subjects drawn from an open population. Subjects' diagnoses were recorded using the DSOM format. All the available data were initially used to evaluate agreement. Then, the subjects were sorted into three groups to demonstrate the effects of differing data marginality on the calculated chance-removed agreement. Outcome measures: Agreement between the practitioners for each subject was evaluated with linearly weighted simple agreement, FK and Gwet's AC2. Results and Conclusions: In all cases, overall agreement was much lower with FK than Gwet's AC2. Larger differences occurred when the data were more free marginal. Inter-rater agreement determined with FK statistics is unlikely to be correct unless it can be shown that the data from which agreement is determined are, in fact, fixed marginal. It follows that results obtained on agreement between practitioners with FK are probably incorrect. It is shown that inter-rater agreement evaluated with AC2 statistic is an appropriate measure when fixed marginal data are neither expected nor guaranteed. The AC2 statistic should be used as the standard statistical approach for determining agreement between practitioners.
Please use this identifier to cite or link to this item: