scholarly journals A data driven learning approach for the assessment of data quality

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Erik Tute ◽  
Nagarajan Ganapathy ◽  
Antje Wulff

Abstract Background Data quality assessment is important but complex and task dependent. Identifying suitable measurement methods and reference ranges for assessing their results is challenging. Manually inspecting the measurement results and current data driven approaches for learning which results indicate data quality issues have considerable limitations, e.g. to identify task dependent thresholds for measurement results that indicate data quality issues. Objectives To explore the applicability and potential benefits of a data driven approach to learn task dependent knowledge about suitable measurement methods and assessment of their results. Such knowledge could be useful for others to determine whether a local data stock is suitable for a given task. Methods We started by creating artificial data with previously defined data quality issues and applied a set of generic measurement methods on this data (e.g. a method to count the number of values in a certain variable or the mean value of the values). We trained decision trees on exported measurement methods’ results and corresponding outcome data (data that indicated the data’s suitability for a use case). For evaluation, we derived rules for potential measurement methods and reference values from the decision trees and compared these regarding their coverage of the true data quality issues artificially created in the dataset. Three researchers independently derived these rules. One with knowledge about present data quality issues and two without. Results Our self-trained decision trees were able to indicate rules for 12 of 19 previously defined data quality issues. Learned knowledge about measurement methods and their assessment was complementary to manual interpretation of measurement methods’ results. Conclusions Our data driven approach derives sensible knowledge for task dependent data quality assessment and complements other current approaches. Based on labeled measurement methods’ results as training data, our approach successfully suggested applicable rules for checking data quality characteristics that determine whether a dataset is suitable for a given task.

Author(s):  
Syed Mustafa Ali ◽  
Farah Naureen ◽  
Arif Noor ◽  
Maged Kamel N. Boulos ◽  
Javariya Aamir ◽  
...  

Background Increasingly, healthcare organizations are using technology for the efficient management of data. The aim of this study was to compare the data quality of digital records with the quality of the corresponding paper-based records by using data quality assessment framework. Methodology We conducted a desk review of paper-based and digital records over the study duration from April 2016 to July 2016 at six enrolled TB clinics. We input all data fields of the patient treatment (TB01) card into a spreadsheet-based template to undertake a field-to-field comparison of the shared fields between TB01 and digital data. Findings A total of 117 TB01 cards were prepared at six enrolled sites, whereas just 50% of the records (n=59; 59 out of 117 TB01 cards) were digitized. There were 1,239 comparable data fields, out of which 65% (n=803) were correctly matched between paper based and digital records. However, 35% of the data fields (n=436) had anomalies, either in paper-based records or in digital records. 1.9 data quality issues were calculated per digital patient record, whereas it was 2.1 issues per record for paper-based record. Based on the analysis of valid data quality issues, it was found that there were more data quality issues in paper-based records (n=123) than in digital records (n=110). Conclusion There were fewer data quality issues in digital records as compared to the corresponding paper-based records. Greater use of mobile data capture and continued use of the data quality assessment framework can deliver more meaningful information for decision making.


Data ◽  
2018 ◽  
Vol 3 (3) ◽  
pp. 27 ◽  
Author(s):  
Syed Ali ◽  
Farah Naureen ◽  
Arif Noor ◽  
Maged Kamel Boulos ◽  
Javariya Aamir ◽  
...  

Background: The cornerstone of the public health function is to identify healthcare needs, to influence policy development, and to inform change in practice. Current data management practices with paper-based recording systems are prone to data quality defects. Increasingly, healthcare organizations are using technology for the efficient management of data. The aim of this study was to compare the data quality of digital records with the quality of the corresponding paper-based records using a data quality assessment framework. Methodology: We conducted a desk review of paper-based and digital records over the study duration from April 2016 to July 2016 at six enrolled tuberculosis (TB) clinics. We input all data fields of the patient treatment (TB01) card into a spreadsheet-based template to undertake a field-to-field comparison of the shared fields between TB01 and digital data. Findings: A total of 117 TB01 cards were prepared at six enrolled sites, whereas just 50% of the records (n = 59; 59 out of 117 TB01 cards) were digitized. There were 1239 comparable data fields, out of which 65% (n = 803) were correctly matched between paper based and digital records. However, 35% of the data fields (n = 436) had anomalies, either in paper-based records or in digital records. The calculated number of data quality issues per digital patient record was 1.9, whereas it was 2.1 issues per record for paper-based records. Based on the analysis of valid data quality issues, it was found that there were more data quality issues in paper-based records (n = 123) than in digital records (n = 110). Conclusion: There were fewer data quality issues in digital records as compared with the corresponding paper-based records of tuberculosis patients. Greater use of mobile data capture and continued data quality assessment can deliver more meaningful information for decision making.


Author(s):  
Nemanja Igić ◽  
Branko Terzić ◽  
Milan Matić ◽  
Vladimir Ivančević ◽  
Ivan Luković

2018 ◽  
Vol 7 (4) ◽  
pp. e000353 ◽  
Author(s):  
Luke A Turcotte ◽  
Jake Tran ◽  
Joshua Moralejo ◽  
Nancy Curtin-Telegdi ◽  
Leslie Eckel ◽  
...  

BackgroundHealth information systems with applications in patient care planning and decision support depend on high-quality data. A postacute care hospital in Ontario, Canada, conducted data quality assessment and focus group interviews to guide the development of a cross-disciplinary training programme to reimplement the Resident Assessment Instrument–Minimum Data Set (RAI-MDS) 2.0 comprehensive health assessment into the hospital’s clinical workflows.MethodsA hospital-level data quality assessment framework based on time series comparisons against an aggregate of Ontario postacute care hospitals was used to identify areas of concern. Focus groups were used to evaluate assessment practices and the use of health information in care planning and clinical decision support. The data quality assessment and focus groups were repeated to evaluate the effectiveness of the training programme.ResultsInitial data quality assessment and focus group indicated that knowledge, practice and cultural barriers prevented both the collection and use of high-quality clinical data. Following the implementation of the training, there was an improvement in both data quality and the culture surrounding the RAI-MDS 2.0 assessment.ConclusionsIt is important for facilities to evaluate the quality of their health information to ensure that it is suitable for decision-making purposes. This study demonstrates the use of a data quality assessment framework that can be applied for quality improvement planning.


Sign in / Sign up

Export Citation Format

Share Document