Background: This author's previous MSc research concluded that traditional Chinese medicine (TCM) practitioners had poor levels of inter rater reliability (IRR) when observing tongue characteristics. This study suggested that IRR could be improved by developing operational definitions for the different tongue characteristics and a standardised protocol for observing the tongue. To this end this thesis documents the development and evaluation of a reliable tongue inspection tool and protocol.
Aim: To develop a reliable tongue inspection reference tool and protocol by examining different groups' tongue inspection interpretations.
Method: In addition to a reanalysis of the previous MSc study data (Study 1), five studies were undertaken to obtain inter rater reliability (IRR) levels for different groups of subjects while developing and refining the inspection tool and protocol. For three within subjects experimental design studies (Study II, V and VI), IRR levels were measured before and after a formal training session of tongue inspection. For the other two studies (Study III and IV), a between groups comparison of IRR levels was undertaken both before (Study III) as well as after (Study IV) a session of training using the newly developed tongue inspection tool. All data collection sessions involved subjects viewing a variety of digital tongue images using a data projector or computer screen and the completion of a tongue inspection form. Study II measured the IRR levels of 28 Year 2 TCM students both before and after the standard TCM curriculum training of tongue inspection lecture. Study III compared the IRR levels of 22 Year 2 students who had not undertaken any formal training in tongue inspection to 23 Year 4 students who had previously had formal instruction in tongue inspection in their second year of TCM education. Study IV evaluated this same cohort of students some five weeks later following training with the improved inspection method using operational definitions, colour charts and a standardised protocol. Study V, a within group repeated measures experimental design, measured IRR levels for both 21 inexperienced Year 2 TCM students while Study VI, used the same design, and involved 33 Year 4 TCM students. Both student groups had their IRR levels evaluated at baseline as well as after instruction in the newly developed tongue inspection tool. Statistical analysis involved assessing IRR with an agreement level of ≥80% was used as a criterion for an acceptable level of reliability. In addition, t-tests and Pearson product correlation coefficients were also computed where applicable.
Results: Study I showed that using the new scoring method, the practitioners were poor at choosing the 'correct' answers and had a low level of agreement. Their performance did not improve or change across the two sessions.
For Study II, two characteristics achieved the criterion at both sessions (coat presence and body deviation) and there was one statistically significant increase in mean IRR value (body enlargement). Within both sessions 1 and 2, the initial and repeat viewing mean IRR values obtained for a given tongue example showed little change and in general were low and the mean percentage scores for the group of 20 subjects did not increase significantly from session 1 to session 2.
Study III demonstrated statistically significant differences for two tongue characteristics (coat presence and coat thickness) with coat presence achieving the criterion IRR for both groups. Both groups achieved overall IRRs ≥0.8 for the characteristics of coat presence, coat colour, coat peeling, body indent, crack presence and body deviation. For two of the 14 characteristics statistically significant IRR levels were achieved. In general, IRR profiles were quite similar.
Study IV revealed that the two year groups achieved the IRR criterion level for a similar number of characteristics (ten characteristics for Year 2 of eight for Year 4) with coat texture, body colour, body enlargement and crack type failing to achieve the criterion for both groups. No statistically significant differences were found between the two groups for any characteristics.
Study V demonstrated that the statistically significant changes were achieved for seven of the 14 characteristics from sessions 1 to 2 (coat colour; coat moisture; coat texture; body enlargement; body thorn; crack presence and crack type). This was reflected in the statistically significant increase in the group's mean score of 7%.
For Study VI, there were statistically significant increases for six of the 14 characteristics. These were coat peeling, coat texture, body enlargement, body indent, body thorns and body length. The mean percentage scores for this group increased significantly from session 1 to session 2 by 8%.
Conclusion: This series of studies on the reliability of tongue inspection has demonstrated that with appropriate instruction and the development of appropriate concrete operational definitions for the tongue characteristics and colour charts, significant improvements can be achieved in the reliability associated with identifying diagnostic characteristics. The use of this tongue inspection tool will advance education, clinical practice and research. TCM teaching institutions and individual practitioners should be encouraged to use and incorporate the tool into everyday practice. In addition further research can now assess whether the tongue diagnosis method is valid for specific TCM clinical diseases conditions.