Data Quality in an Information-Rich Environment: Canada as an Example

2005 ◽  
Vol 24 (S1) ◽  
pp. 153-170 ◽  
Author(s):  
Leslie L. Roos ◽  
Sumit Gupta ◽  
Ruth-Ann Soodeen ◽  
Laurel Jebamani

ABSTRACTThis review evaluates the quality of available administrative data in the Canadian provinces, emphasizing the information needed to create integrated systems. We explicitly compare approaches to quality measurement, indicating where record linkage can and cannot substitute for more expensive record re-abstraction. Forty-nine original studies evaluating Canadian administrative data (registries, hospital abstracts, physician claims, and prescription drugs) are summarized in a structured manner. Registries, hospital abstracts, and physician files appear to be generally of satisfactory quality, though much work remains to be done. Data quality did not vary systematically among provinces. Primary data collection to check place of residence and longitudinal follow-up in provincial registries is needed. Promising initial checks of pharmaceutical data should be expanded. Because record linkage studies were “conservative” in reporting reliability, the reduction of time-consuming record re-abstraction appears feasible in many cases. Finally, expanding the scope of administrative data to study health, as well as health care, seems possible for some chronic conditions. The research potential of the information-rich environments being created highlights the importance of data quality.

2015 ◽  
Vol 31 (2) ◽  
pp. 231-247 ◽  
Author(s):  
Matthias Schnetzer ◽  
Franz Astleithner ◽  
Predrag Cetkovic ◽  
Stefan Humer ◽  
Manuela Lenk ◽  
...  

Abstract This article contributes a framework for the quality assessment of imputations within a broader structure to evaluate the quality of register-based data. Four quality-related hyperdimensions examine the data processing from the raw-data level to the final statistics. Our focus lies on the quality assessment of different imputation steps and their influence on overall data quality. We suggest classification rates as a measure of accuracy of imputation and derive several computational approaches.


2016 ◽  
Vol 12 (3) ◽  
pp. 111-133 ◽  
Author(s):  
Ahmad Assaf ◽  
Aline Senart ◽  
Raphaël Troncy

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.


Author(s):  
Sarah Rees ◽  
Arfon Rees

ABSTRACTObjectivesThe SAIL databank brings together a range of datasets gathered primarily for administrative rather than research processes. These datasets contain information regarding different aspects of an individual’s contact with services which when combined form a detailed health record for individuals living (or deceased) in Wales. Understanding the quality of data in SAIL supports the research process by providing a level of assurance about the robustness of data, identifying and describing where there may be sources of potential bias due to invalid, incomplete, inconsistent or inaccurate data and therefore helping to increase the accuracy of research using these data. Designing processes to investigate and report on data quality within and between multiple datasets can be a time-consuming task to undertake; it requires a high degree of effort to ensure it is genuinely meaningful and useful to SAIL users and may require a range of different approaches. ApproachData quality tests for each dataset were written, considering a range of data quality dimensions including validity, consistency, accuracy and completeness. Tests were designed to capture not just the quality of data within each dataset, but also to assess consistency of data items between datasets. SQL scripts were written to test each of these aspects: in order to minimise repetition, automated processes were implemented where appropriate. Batch automation was used to called SQL stored procedures, which utilise metadata to generate dynamic SQL. The metadata (created as part of the data quality process) describes each dataset and the measurement parameters used to assess each field within the dataset. However automation on its own is insufficient and data quality process outputs require scrutiny and oversight to ensure they are actually capturing what they set out to do. SAIL users were consulted on the development of the data quality reports to ensure usability and appropriateness to support data utilisation for research. ResultsThe data quality reporting process is beneficial to the SAIL databank as it provides additional information to support the research process and in some cases may act as a diagnostic tool, detecting problems with data which can then be rectified. ConclusionThe development of data quality processes in SAIL is ongoing, and changes or developments in each dataset lead to new requirements for data quality measurement and reporting. A vital component of the process is the production of output that is genuinely meaningful and useful.


Jurnal NERS ◽  
2018 ◽  
Vol 13 (1) ◽  
pp. 114 ◽  
Author(s):  
Putu Dian Prima Kusuma Dewi ◽  
Gede Budi Widiarta

Introduction: The death of HIV/AIDS patients after receiving therapy in Bali is the seventh highest percentage of deaths in Indonesia. LTFU increases the risk of death in PLHA, given the saturation of people with HIV taking medication. The level of consistency in the treatment is very important to maintain the resilience and quality of life of people living with HIV. This study aims to determine the incidence rate, median time and predictors of death occurring in LTFU patients as seen from their sociodemographic and clinical characteristics.Methods: This study used an analytical longitudinal approach with retrospective secondary data analysis in a cohort of HIV-positive patients receiving ARV therapy at the Buleleng District Hospital in the period 2006-2015. The study used the survival analysis available within the STATA SE 12 softwareResults: The result showed that the incidence rate of death in LTFU patients was 65.9 per 100 persons, with the median time occurrence of 0.2 years (2.53 months). The NNRTI-class antiretroviral evapirens agents were shown to increase the risk of incidence of death in LTFU patients 3.92 times greater than the nevirapine group (HR 3.92; p = 0.007 (CI 1.46-10.51). Each 1 kg increase in body weight decreased the risk of death in LTFU patients by 6% (HR 0.94; p = 0.035 (CI 0.89-0.99).Conclusion: An evaluation and the monitoring of patient tracking with LTFU should be undertaken to improve sustainability. Furthermore, an observation of the LTFU patient's final condition with primary data and qualitative research needs to be done so then it can explore more deeply the reasons behind LTFU.


Author(s):  
Catherine Eastwood ◽  
Keith Denny ◽  
Maureen Kelly ◽  
Hude Quan

Theme: Data and Linkage QualityObjectives: To define health data quality from clinical, data science, and health system perspectives To describe some of the international best practices related to quality and how they are being applied to Canada’s administrative health data. To compare methods for health data quality assessment and improvement in Canada (automated logical checks, chart quality indicators, reabstraction studies, coding manager perspectives) To highlight how data linkage can be used to provide new insights into the quality of original data sources To highlight current international initiatives for improving coded data quality including results from current ICD-11 field trials Dr. Keith Denny: Director of Clinical Data Standards and Quality, Canadian Insititute for Health Information (CIHI), Adjunct Research Professor, Carleton University, Ottawa, ON. He provides leadership for CIHI’s information quality initiatives and for the development and application of clinical classifications and terminology standards. Maureen Kelly: Manager of Information Quality at CIHI, Ottawa, ON. She leads CIHI’s corporate quality program that is focused on enhancing the quality of CIHI’s data sources and information products and to fostering CIHI’s quality culture. Dr. Cathy Eastwood: Scientific Manager, Associate Director of Alberta SPOR Methods & Development Platform, Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB. She has expertise in clinical data collection, evaluation of local and systemic data quality issues, disease classification coding with ICD-10 and ICD-11. Dr. Hude Quan: Professor, Community Health Sciences, Cumming School of Medicine, University of Calgary, Director Alberta SPOR Methods Platform; Co-Chair of Hypertension Canada, Co-Chair of Person to Population Health Collaborative of the Libin Cardiovascular Institute in Calgary, AB. He has expertise in assessing, validating, and linking administrative data sources for conducting data science research including artificial intelligence methods for evaluating and improving data quality. Intended Outcomes:“What is quality health data?” The panel of experts will address this common question by discussing how to define high quality health data, and measures being taken to ensure that they are available in Canada. Optimizing the quality of clinical-administrative data, and their use-value, first requires an understanding of the processes used to create the data. Subsequently, we can address the limitations in data collection and use these data for diverse applications. Current advances in digital data collection are providing more solutions to improve health data quality at lower cost. This panel will describe a number of quality assessment and improvement initiatives aimed at ensuring that health data are fit for a range of secondary uses including data linkage. It will also discuss how the need for the linkage and integration of data sources can influence the views of the data source’s fitness for use. CIHI content will include: Methods for optimizing the value of clinical-administrative data CIHI Information Quality Framework Reabstraction studies (e.g. physician documentation/coders’ experiences) Linkage analytics for data quality University of Calgary content will include: Defining/measuring health data quality Automated methods for quality assessment and improvement ICD-11 features and coding practices Electronic health record initiatives


2022 ◽  
Vol 10 (01) ◽  
pp. 508-518
Author(s):  
Richmond Nsiah ◽  
Wisdom Takramah ◽  
Solomon Anum-Doku ◽  
Richard Avagu ◽  
Dominic Nyarko

Background: Stillbirths and neonatal deaths when poorly documented or collated, negatively affect the quality of decision and interventions. This study sought to assess the quality of routine neonatal mortalities and stillbirth records in health facilities and propose interventions to improve the data quality gaps. Method: Descriptive cross-sectional study was employed. This study was carried out at three (3) purposively selected health facilities in Offinso North district. Stillbirths and neonatal deaths recorded in registers from 2015 to 2017, were recounted and compared with monthly aggregated data and District Health Information Management System 2 (DHIMS 2) data using a self-developed Excel Data Quality Assessment Tool (DQS).  An observational checklist was used to collect primary data on completeness and availability. Accuracy ratio (verification factor), discrepancy rate, percentage availability and completeness of stillbirths and neonatal mortality data were computed using the DQS tool. Findings: The results showed high discrepancy rate of stillbirth data recorded in registers compared with monthly aggregated reports (12.5%), and monthly aggregated reports compared with DHIMS 2 (13.5%). Neonatal mortalities data were under-reported in monthly aggregated reports, but over-reported in DHIMS 2. Overall data completeness was about 84.6%, but only 68.5% of submitted reports were supervised by facility in-charges. Delivery and admission registers availability were 100% and 83.3% respectively. Conclusion: Quality of stillbirths and neonatal mortality data in the district is generally encouraging, but are not reliable for decision-making. Routine data quality audit is needed to reduce high discrepancies in stillbirth and neonatal mortality data in the district.


2017 ◽  
Vol 9 (1) ◽  
Author(s):  
Sophia Crossen

ObjectiveTo explore the quality of data submitted once a facility is movedinto an ongoing submission status and address the importance ofcontinuing data quality assessments.IntroductionOnce a facility meets data quality standards and is approved forproduction, an assumption is made that the quality of data receivedremains at the same level. When looking at production data qualityreports from various states generated using a SAS data qualityprogram, a need for production data quality assessment was identified.By implementing a periodic data quality update on all productionfacilities, data quality has improved for production data as a whole andfor individual facility data. Through this activity several root causesof data quality degradation have been identified, allowing processesto be implemented in order to mitigate impact on data quality.MethodsMany jurisdictions work with facilities during the onboardingprocess to improve data quality. Once a certain level of data qualityis achieved, the facility is moved into production. At this point thejurisdiction generally assumes that the quality of the data beingsubmitted will remain fairly constant. To check this assumption inKansas, a SAS Production Report program was developed specificallyto look at production data quality.A legacy data set is downloaded from BioSense production serversby Earliest Date in order to capture all records for visits which occurredwithin a specified time frame. This data set is then run through a SASdata quality program which checks specific fields for completenessand validity and prints a report on counts and percentages of null andinvalid values, outdated records, and timeliness of record submission,as well as examples of records from visits containing these errors.A report is created for the state as a whole, each facility, EHR vendor,and HIE sending data to the production servers, with examplesprovided only by facility. The facility, vendor, and HIE reportsinclude state percentages of errors for comparison.The Production Report was initially run on Kansas data for thefirst quarter of 2016 followed by consultations with facilities on thefindings. Monthly checks were made of data quality before and afterfacilities implemented changes. An examination of Kansas’ resultsshowed a marked decrease in data quality for many facilities. Everyfacility had at least one area in need of improvement.The data quality reports and examples were sent to every facilitysending production data during the first quarter attached to an emailrequesting a 30-60 minute call with each to go over the report. Thiscall was deemed crucial to the process since it had been over a year,and in a few cases over two years, since some of the facilities hadlooked at data quality and would need a review of the findings andall requirements, new and old. Ultimately, over half of all productionfacilities scheduled a follow-up call.While some facilities expressed some degree of trepidation, mostfacilities were open to revisiting data quality and to making requestedimprovements. Reasons for data quality degradation included updatesto EHR products, change of EHR product, work flow issues, engineupdates, new requirements, and personnel turnover.A request was made of other jurisdictions (including Arizona,Nevada, and Illinois) to look at their production data using the sameprogram and compare quality. Data was pulled for at least one weekof July 2016 by Earliest Date.ResultsMonthly reports have been run on Kansas Production data bothbefore and after the consultation meetings which indicate a markedimprovement in both completeness of required fields and validityof values in those fields. Data for these monthly reports was againselected by Earliest Date.ConclusionsIn order to ensure production data continues to be of value forsyndromic surveillance purposes, periodic data quality assessmentsshould continue after a facility reaches ongoing submission status.Alterations in process include a review of production data at leasttwice per year with a follow up data review one month later to confirmadjustments have been correctly implemented.


Author(s):  
Ahmad Assaf ◽  
Aline Senart ◽  
Raphaël Troncy

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.


Soil Research ◽  
2017 ◽  
Vol 55 (4) ◽  
pp. 309 ◽  
Author(s):  
Andrew J. W. Biggs ◽  
Ross Searle

The development and implementation of a national data schema for soil data in Australia over the last two decades, coupled with advances in information technology, has led to the realisation of more comprehensive state and national soil databases. This has facilitated increased access to soil data for many purposes, including the creation of many digital soil-mapping products, such as the Soil and Landscape Grid of Australia. Consequently, users of soil data have a growing need for clarity concerning the quality of the data; many new users have little understanding of the varying quality of the data. To date, statements about the quality of primary soil data have typically been qualitative and/or judgemental rather than explicit. The consequences of poor-quality primary data and of the lack of a coding system for data quality are growing with increased usage and with demand for soil data at the regional to national scale. Pillar 4 of the Global Soil Partnership and the National Soil Research, Development and Extension Strategy both identify the need to improve the quality of soil data. Various international standards do exist with respect to the quality of soil data but these tend to focus on general principles and quality-assurance frameworks rather than the detail of describing data quality. The aim of this paper is to stimulate a discussion in the Australian soil science community on how to quantify and describe the quality of primary soil data. We provide examples of the data quality issues and propose a framework for structured data-quality checking procedures and quality coding of soil morphological and analytical data in Australia.


Sign in / Sign up

Export Citation Format

Share Document