Data cleaning for clinician researchers: Application and explanation of a data-quality framework.
- Publisher:
- ELSEVIER SCIENCE INC
- Publication Type:
- Journal Article
- Citation:
- Aust Crit Care, 2024, 37, (5), pp. 827-833
- Issue Date:
- 2024-09
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
1-s2.0-S1036731424000584-main.pdf | Published version | 567.01 kB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
BACKGROUND: Data cleaning is the series of procedures performed before a formal statistical analysis, with the aim of reducing the number of error values in a dataset and improving the overall quality of subsequent analyses. Several study-reporting guidelines recommend the inclusion of data-cleaning procedures; however, little practical guidance exists for how to conduct these procedures. OBJECTIVES: This paper aimed to provide practical guidance for how to perform and report rigorous data-cleaning procedures. METHODS: A previously proposed data-quality framework was identified and used to facilitate the description and explanation of data-cleaning procedures. The broader data-cleaning process was broken down into discrete tasks to create a data-cleaning checklist. Examples of the how the various tasks had been undertaken for a previous study using data from the Australia and New Zealand Intensive Care Society Adult Patient Database were also provided. RESULTS: Data-cleaning tasks were described and grouped according to four data-quality domains described in the framework: data integrity, consistency, completeness, and accuracy. Tasks described include creation of a data dictionary, checking consistency of values across multiple variables, quantifying and managing missing data, and the identification and management of outlier values. The data-cleaning task checklist provides a practical summary of the various aspects of the data-cleaning process and will assist clinician researchers in performing this process in the future. CONCLUSIONS: Data cleaning is an integral part of any statistical analysis and helps ensure that study results are valid and reproducible. Use of the data-cleaning task checklist will facilitate the conduct of rigorous data-cleaning processes, with the aim of improving the quality of future research.
Please use this identifier to cite or link to this item: