Assessing the quality of clinical and administrative data extracted from hospitals: The General Medicine Inpatient Initiative (GEMINI) experience

AbstractObjectiveLarge clinical databases are increasingly being used for research and quality improvement, but there remains uncertainty about how computational and manual approaches can be used together to assess and improve the quality of extracted data. The General Medicine Inpatient Initiative (GEMINI) database extracts and standardizes a broad range of data from clinical and administrative hospital data systems, including information about attending physicians, room transfers, laboratory tests, diagnostic imaging reports, and outcomes such as death in-hospital. We describe computational data quality assessment and manual data validation techniques that were used for GEMINI.MethodsThe GEMINI database currently contains 245,559 General Internal Medicine patient admissions at 7 hospital sites in Ontario, Canada from 2010-2017. We performed 7 computational data quality checks followed by manual validation of 23,419 selected data points on a sample of 7,488 patients across participating hospitals. After iteratively re-extracting data as needed based on the computational data quality checks, we manually validated GEMINI data against the data that could be obtained using the hospital’s electronic medical record (i.e. the data clinicians would see when providing care), which we considered the gold standard. We calculated accuracy, sensitivity, specificity, and positive and negative predictive values of GEMINI data.ResultsComputational checks identified multiple data quality issues – for example, the inclusion of cancelled radiology tests, a time shift of transfusion data, and mistakenly processing the symbol for sodium, “Na”, as a missing value. Manual data validation revealed that GEMINI data were ultimately highly reliable compared to the gold standard across nearly all data tables. One important data quality issue was identified by manual validation that was not detected by computational checks, which was that the dates and times of blood transfusion data at one site were not reliable. This resulted in low sensitivity (66%) and positive predictive value (75%) for blood transfusion data at that site. Apart from this single issue, GEMINI data were highly reliable across all data tables, with high overall accuracy (ranging from 98-100%), sensitivity (95-100%), specificity (99-100%), positive predictive value (93-100%), and negative predictive value (99-100%) compared to the gold standard.Discussion and ConclusionIterative assessment and improvement of data quality based primarily on computational checks permitted highly reliable extraction of multisite clinical and administrative data. Computational checks identified nearly all of the data quality issues in this initiative but one critical quality issue was only identified during manual validation. Combining computational checks and manual validation may be the optimal method for assessing and improving the quality of large multi-site clinical databases.

Download Full-text

Assessing the quality of clinical and administrative data extracted from hospitals: the General Medicine Inpatient Initiative (GEMINI) experience

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa225 ◽

2020 ◽

Author(s):

Amol A Verma ◽

Sachin V Pasricha ◽

Hae Young Jung ◽

Vladyslav Kushnir ◽

Denise Y F Mak ◽

...

Keyword(s):

Data Quality ◽

Predictive Value ◽

General Medicine ◽

Time Shift ◽

Single Issue ◽

Quality Issue ◽

Clinical Databases ◽

Computational Data ◽

Quality Checks

Abstract Objective Large clinical databases are increasingly used for research and quality improvement. We describe an approach to data quality assessment from the General Medicine Inpatient Initiative (GEMINI), which collects and standardizes administrative and clinical data from hospitals. Methods The GEMINI database contained 245 559 patient admissions at 7 hospitals in Ontario, Canada from 2010 to 2017. We performed 7 computational data quality checks and iteratively re-extracted data from hospitals to correct problems. Thereafter, GEMINI data were compared to data that were manually abstracted from the hospital’s electronic medical record for 23 419 selected data points on a sample of 7488 patients. Results Computational checks flagged 103 potential data quality issues, which were either corrected or documented to inform future analysis. For example, we identified the inclusion of canceled radiology tests, a time shift of transfusion data, and mistakenly processing the chemical symbol for sodium (“Na”) as a missing value. Manual validation identified 1 important data quality issue that was not detected by computational checks: transfusion dates and times at 1 site were unreliable. Apart from that single issue, across all data tables, GEMINI data had high overall accuracy (ranging from 98%–100%), sensitivity (95%–100%), specificity (99%–100%), positive predictive value (93%–100%), and negative predictive value (99%–100%) compared to the gold standard. Discussion and Conclusion Computational data quality checks with iterative re-extraction facilitated reliable data collection from hospitals but missed 1 critical quality issue. Combining computational and manual approaches may be optimal for assessing the quality of large multisite clinical databases.

Download Full-text

Increasing Trust in Real-World Evidence Through Evaluation of Observational Data Quality

10.1101/2021.03.25.21254341 ◽

2021 ◽

Author(s):

Clair Blacketer ◽

Frank J Defalco ◽

Patrick B Ryan ◽

Peter R Rijnbeek

Keyword(s):

Data Quality ◽

Real World ◽

R Package ◽

Observational Research ◽

Real World Data ◽

Quality Reporting ◽

Healthcare Data ◽

Real World Evidence ◽

Quality Checks

Advances in standardization of observational healthcare data have enabled methodological breakthroughs, rapid global collaboration, and generation of real-world evidence to improve patient outcomes. Standardizations in data structure, such as use of Common Data Models (CDM), need to be coupled with standardized approaches for data quality assessment. To ensure confidence in real-world evidence generated from the analysis of real-world data, one must first have confidence in the data itself. The Data Quality Dashboard is an open-source R package that reports potential quality issues in an OMOP CDM instance through the systematic execution and summarization of over 3,300 configurable data quality checks. We describe the implementation of check types across a data quality framework of conformance, completeness, plausibility, with both verification and validation. We illustrate how data quality checks, paired with decision thresholds, can be configured to customize data quality reporting across a range of observational health data sources. We discuss how data quality reporting can become part of the overall real-world evidence generation and dissemination process to promote transparency and build confidence in the resulting output. Transparently communicating how well CDM standardized databases adhere to a set of quality measures adds a crucial piece that is currently missing from observational research. Assessing and improving the quality of our data will inherently improve the quality of the evidence we generate.

Download Full-text

Neuroscience publishing is too important to leave to publishers

Neuroanatomy and Behaviour ◽

10.35430/nab.2019.e7 ◽

2019 ◽

Vol 1 ◽

pp. ed1

Author(s):

Shaun Yon-Seng Khoo

Keyword(s):

Open Access ◽

Data Quality ◽

Open Data ◽

Open Access Journal ◽

Non Profit ◽

Quality Checks

Almost every open access neuroscience journal is pay-to-publish. This leaves neuroscientists with a choice of submitting to journals that not all of our colleagues can legitimately access and choosing to pay large sums of money to publish open access. Neuroanatomy and Behaviour is a new platinum open access journal published by a non-profit association of scientists. Since we do not charge fees, we will focus entirely on the quality of submitted articles and encourage the adoption of reproducibility-enhancing practices, like open data, preregistration, and data quality checks. We hope that our colleagues will join us in this endeavour so that we can support good neuroscience no matter where it comes from.

Download Full-text

Quality of Data Reported to a Smaller-Hospital Pilot Surveillance Program

Infection Control and Hospital Epidemiology ◽

10.1086/513027 ◽

2007 ◽

Vol 28 (4) ◽

pp. 486-488 ◽

Cited By ~ 2

Author(s):

Noleen J. Bennett ◽

Ann L. Bull ◽

David R. Dunt ◽

Michael J. Richards ◽

Philip L. Russo ◽

...

Keyword(s):

Staphylococcus Aureus ◽

Positive Predictive Value ◽

Data Quality ◽

Bloodstream Infection ◽

Predictive Value ◽

Surveillance Program ◽

Quality Of Data ◽

Mrsa Infection ◽

Quality Study

This data quality study assessed the accuracy of data collected as part of a pilot smaller-hospital surveillance program for methicillin-resistant Staphylococcus aureus (MRSA) infection and bloodstream infection (BSI). For reported MRSA infection, estimated values were as follows: sensitivity, 40%; specificity, 99.9%; and positive predictive value, 33.3%. For reported BSI, estimated values were as follows: sensitivity, 42.9%; specificity, 99.8%; and positive predictive value, 37.5%.

Download Full-text

Quality of Plague Surveillance System in Pasuruan Regency Year 2014 Based on Surveillance Attributes

Jurnal Berkala Epidemiologi ◽

10.20473/jbe.v5i12017.60-74 ◽

2017 ◽

Vol 5 (1) ◽

pp. 60

Author(s):

Siti Malikhatin ◽

Lucia Yovita Hendrati

Keyword(s):

Data Quality ◽

Predictive Value ◽

Surveillance System ◽

Monitoring And Evaluation ◽

Evaluation Study ◽

Surveillance Report ◽

E Mail ◽

Spreadsheet Software ◽

Budgetary Fund

Plague which is the zoonosis quarantine disease remains occur in Pasuruan Regency. Plague suspect was still found until year 2013. Plague surveillance that still actively conducted in Pasuruan Regency is probably the only one plague surveillance in Indonesia. Plague surveillance consist of human and rodent surveillance. Evaluation of surveillance system is needed to improve its quality, efficiency, and usefulness. This research aimed to assess the quality of plague surveillance system in Pasuruan Regency year 2014 based on attributes which are simplicity, flexibility, acceptability, data quality, sensitivity, predictive value positive, representativeness, timeliness, and stability. The research design was evaluation study. Subjecct was plague surveillance system in Pasuruan Regency year 2014. Data were collected by interviews, observations, and document study. The obtained data and information were compared to the guidelines and recent theories then presented in narrations, tables, and figures. The research showed that the surveillance system was simple and flexible, lack of data quality and acceptability, unmeasurable sensitivity and predictive value positive, low representativeness and timeliness, and high stability. This research concluded that quality of plague surveillance system in Pasuruan Regency based on its attributes was not good enough. The suggestion given are conduct training, supply a sufficient budgetary fund, do monitoring and evaluation periodically, disseminate information to another program and sector, also to the community, send surveillance report by e-mail, use spreadsheet software for surveillance rodent reporting, improve the report by including information about damage serum, serum less than the total of captured rodent, and report about the missing trap. Keyword: plague, human surveillance, rodent surveillance, surveillance attributes, evaluation

Download Full-text

Data Quality Automation: a Generic Approach for Large Linked Research Datasets

International Journal for Population Data Science ◽

10.23889/ijpds.v3i4.1000 ◽

2018 ◽

Vol 3 (4) ◽

Author(s):

Muhammad A Elmessary ◽

Daniel Thayer ◽

Sarah Rees ◽

Leticia ReesKemp ◽

Arfon Rees

Keyword(s):

Data Quality ◽

Relational Database ◽

Population Data ◽

Inaccurate Data ◽

Linkage Quality ◽

Quality Checks ◽

Speed Up ◽

Data Source ◽

Many Sources

IntroductionWhen datasets are collected mainly for administrative rather than research purposes, data quality checks are necessary to ensure robust findings and to avoid biased results due to incomplete or inaccurate data. When done manually, data quality checks are time-consuming. We introduced automation to speed up the process and save effort. Objectives and ApproachWe have devised a set of automated generic quality checks and reporting, which can be run on any dataset in a relational database without any dataset-specific knowledge or configuration. The code is written in Python. Checks include: linkage quality, agreement with a population data source, comparison with previous data version, duplication checks, null count, value distribution and range, etc. Where dataset metadata is available, checks for validity against lookup tables are included, and the output report includes documentation on data contents. An HTML report with dynamic datatables and interactive graphs, allowing easy exploration of the results, is produced using RMarkdown. ResultsThe automation of the generic data quality check provides an easy and quick tool to report on data issues with minimal effort. It allows comparison with reference tables, lookups and previous versions of the same table to highlight differences. Moreover, this tool can be provided for researchers as a means to get more detailed understanding about their data. While other research data quality tools exist, this tool is distinguished by its features specific to linked data research, as well as implementation in a relational database environment. It has been successfully tested on datasets of over two billion rows. The tool was designed for use within the SAIL Databank, but could easily be adapted and used in other settings. Conclusion/ImplicationsThe effort spent on automating generic testing and reporting on data quality of research datasets is more than compensated by its outputs. Benefits include quick detection and scrutiny of many sources of invalid and incomplete data. This process can easily be expanded to accommodate more standard tests.

Download Full-text

Performance evaluation of a multinational data platform for critical care in Asia

Wellcome Open Research ◽

10.12688/wellcomeopenres.17122.1 ◽

2021 ◽

Vol 6 ◽

pp. 251

Author(s):

◽

Luigi Pisani ◽

Thalha Rashan ◽

Maryam Shamal ◽

Aniruddha Ghose ◽

...

Keyword(s):

Critical Care ◽

Data Quality ◽

Organizational Performance ◽

Research Quality ◽

Observational Research ◽

Average Score ◽

Quality Level ◽

Process Data ◽

Clinical Databases

Background: The value of medical registries strongly depends on the quality of the data collected. This must be objectively measured before large clinical databases can be promoted for observational research, quality improvement, and clinical trials. We aimed to evaluate the quality of a multinational intensive care unit (ICU) network of registries of critically ill patients established in seven Asian low- and middle-income countries (LMICs). Methods: The Critical Care Asia federated registry platform enables ICUs to collect clinical, outcome and process data for aggregate and unit-level analysis. The evaluation used the standardised criteria of the Directory of Clinical Databases (DoCDat) and a framework for data quality assurance in medical registries. Six reviewers assessed structure, coverage, reliability and validity of the ICU registry data. Case mix and process measures on patient episodes from June to December 2020 were analysed. Results: Data on 20,507 consecutive patient episodes from 97 ICUs in Afghanistan, Bangladesh, India, Malaysia, Nepal, Pakistan and Vietnam were included. The quality level achieved according to the ten prespecified DoCDat criteria was high (average score 3.4 out of 4) as was the structural and organizational performance -- comparable to ICU registries in high-income countries. Identified strengths were types of variables included, reliability of coding, data completeness and validation. Potential improvements included extension of national coverage, optimization of recruitment completeness validation in all centers and the use of interobserver reliability checks. Conclusions: The Critical Care Asia platform evaluates well using standardised frameworks for data quality and equally to registries in resource-rich settings.

Download Full-text

Quality of Plague Surveillance System in Pasuruan Regency Year 2014 Based on Surveillance Attributes

Jurnal Berkala Epidemiologi ◽

10.20473/jbe.v5i1.2017.60-74 ◽

2017 ◽

Vol 5 (1) ◽

pp. 60

Author(s):

Siti Malikhatin ◽

Lucia Yovita Hendrati

Keyword(s):

Data Quality ◽

Predictive Value ◽

Surveillance System ◽

Monitoring And Evaluation ◽

Evaluation Study ◽

Surveillance Report ◽

E Mail ◽

Spreadsheet Software ◽

Budgetary Fund

Download Full-text

A Tailor-made Data Quality Approach for Higher Educational Data

Journal of Data and Information Science ◽

10.2478/jdis-2020-0029 ◽

2020 ◽

Vol 5 (3) ◽

pp. 129-160 ◽

Cited By ~ 1

Author(s):

Cinzia Daraio ◽

Renato Bruni ◽

Giuseppe Catalano ◽

Alessandro Daraio ◽

Giorgio Matteucci ◽

...

Keyword(s):

Higher Education ◽

Data Quality ◽

Tertiary Education ◽

Higher Education Institutions ◽

Cross Sectional ◽

Future Data ◽

Quality Checks ◽

The Stability ◽

Definition Of

AbstractPurposeThis paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register (ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain.Design/methodology/approachThe proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data.FindingsWe show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information.Research limitationsThe coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated.Practical implicationsThe consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives.Originality/valueThe data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.

Download Full-text

Improving data quality of a trauma register

10.26226/morressier.58f5b02fd462b80296c9e0d7 ◽

2017 ◽

Author(s):

Estefania Rabaneda Romero

Keyword(s):

Data Quality

Download Full-text