scholarly journals ETL Best Practices for Data Quality Checks in RIS Databases

Informatics ◽  
2019 ◽  
Vol 6 (1) ◽  
pp. 10 ◽  
Author(s):  
Otmane Azeroual ◽  
Gunter Saake ◽  
Mohammad Abuosba

The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.

Data ◽  
2020 ◽  
Vol 5 (2) ◽  
pp. 30
Author(s):  
Otmane Azeroual ◽  
Joachim Schöpfel ◽  
Dragan Ivanovic

With the steady increase in the number of data sources to be stored and processed by higher education and research institutions, it has become necessary to develop Research Information Systems, which will store this research information in the long term and make it accessible for further use, such as reporting and evaluation processes, institutional decision making and the presentation of research performance. In order to retain control while integrating research information from heterogeneous internal and external data sources and disparate interfaces into RIS and to maximize the benefits of the research information, ensuring data quality in RIS is critical. To facilitate a common understanding of the research information collected and to harmonize data collection processes, various standardization initiatives have emerged in recent decades. These standards support the use of research information in RIS and enable compatibility and interoperability between different information systems. This paper examines the process of securing data quality in RIS and the impact of research information standards on data quality in RIS. We focus on the recently developed German Research Core Dataset standard as a case of application.


Publications ◽  
2019 ◽  
Vol 7 (1) ◽  
pp. 14 ◽  
Author(s):  
Otmane Azeroual ◽  
Joachim Schöpfel

Collecting, integrating, storing and analyzing data in a database system is nothing new in itself. To introduce a current research information system (CRIS) means that scientific institutions must provide the required information on their research activities and research results at a high quality. A one-time cleanup is not sufficient; data must be continuously curated and maintained. Some data errors (such as missing values, spelling errors, inaccurate data, incorrect formatting, inconsistencies, etc.) can be traced across different data sources and are difficult to find. Small mistakes can make data unusable, and corrupted data can have serious consequences. The sooner quality issues are identified and remedied, the better. For this reason, new techniques and methods of data cleansing and data monitoring are required to ensure data quality and its measurability in the long term. This paper examines data quality issues in current research information systems and introduces new techniques and methods of data cleansing and data monitoring with which organizations can guarantee the quality of their data.


Author(s):  
Catherine Eastwood ◽  
Keith Denny ◽  
Maureen Kelly ◽  
Hude Quan

Theme: Data and Linkage QualityObjectives: To define health data quality from clinical, data science, and health system perspectives To describe some of the international best practices related to quality and how they are being applied to Canada’s administrative health data. To compare methods for health data quality assessment and improvement in Canada (automated logical checks, chart quality indicators, reabstraction studies, coding manager perspectives) To highlight how data linkage can be used to provide new insights into the quality of original data sources To highlight current international initiatives for improving coded data quality including results from current ICD-11 field trials Dr. Keith Denny: Director of Clinical Data Standards and Quality, Canadian Insititute for Health Information (CIHI), Adjunct Research Professor, Carleton University, Ottawa, ON. He provides leadership for CIHI’s information quality initiatives and for the development and application of clinical classifications and terminology standards. Maureen Kelly: Manager of Information Quality at CIHI, Ottawa, ON. She leads CIHI’s corporate quality program that is focused on enhancing the quality of CIHI’s data sources and information products and to fostering CIHI’s quality culture. Dr. Cathy Eastwood: Scientific Manager, Associate Director of Alberta SPOR Methods & Development Platform, Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB. She has expertise in clinical data collection, evaluation of local and systemic data quality issues, disease classification coding with ICD-10 and ICD-11. Dr. Hude Quan: Professor, Community Health Sciences, Cumming School of Medicine, University of Calgary, Director Alberta SPOR Methods Platform; Co-Chair of Hypertension Canada, Co-Chair of Person to Population Health Collaborative of the Libin Cardiovascular Institute in Calgary, AB. He has expertise in assessing, validating, and linking administrative data sources for conducting data science research including artificial intelligence methods for evaluating and improving data quality. Intended Outcomes:“What is quality health data?” The panel of experts will address this common question by discussing how to define high quality health data, and measures being taken to ensure that they are available in Canada. Optimizing the quality of clinical-administrative data, and their use-value, first requires an understanding of the processes used to create the data. Subsequently, we can address the limitations in data collection and use these data for diverse applications. Current advances in digital data collection are providing more solutions to improve health data quality at lower cost. This panel will describe a number of quality assessment and improvement initiatives aimed at ensuring that health data are fit for a range of secondary uses including data linkage. It will also discuss how the need for the linkage and integration of data sources can influence the views of the data source’s fitness for use. CIHI content will include: Methods for optimizing the value of clinical-administrative data CIHI Information Quality Framework Reabstraction studies (e.g. physician documentation/coders’ experiences) Linkage analytics for data quality University of Calgary content will include: Defining/measuring health data quality Automated methods for quality assessment and improvement ICD-11 features and coding practices Electronic health record initiatives


Author(s):  
Tom Breur

Business Intelligence (BI) projects that involve substantial data integration have often proven failure-prone and difficult to plan. Data quality issues trigger rework, which makes it difficult to accurately schedule deliverables. Two things can bring improvement. Firstly, one should deliver information products in the smallest possible chunks, but without adding prohibitive overhead for breaking up the work in tiny increments. This will increase the frequency and improve timeliness of feedback on suitability of information products and hence make planning and progress more predictable. Secondly, BI teams need to provide better stewardship when they facilitate discussions between departments whose data cannot easily be integrated. Many so-called data quality errors do not stem from inaccurate source data, but rather from incorrect interpretation of data. This is mostly caused by different interpretation of essentially the same underlying source system facts across departments with misaligned performance objectives. Such problems require prudent stakeholder management and informed negotiations to resolve such differences. In this chapter, the authors suggest an innovation to data warehouse architecture to help accomplish these objectives.


Author(s):  
Chrissy Willemse

The Canadian Institute for Health Information (CIHI) provides essential information on Canada’s health systems and the health of Canadians. This presentation discusses information quality’s role in the integration and utilization of CIHI’s complex, multi-sector and multi-jurisdictional data. IntroductionCIHI’s Data and Information Quality Program is recognized internationally for its comprehensiveness and high standards. As the need for linked data research increases, the requirements on quality continue to grow. CIHI’s multi-sector, multi-jurisdictional healthcare system and the varying health policies, care delivery models, and data collection practices that go with it pose challenges for researchers as they try to pull the data together in a comprehensive way. CIHI’s Information Quality Framework forms the foundation for addressing these challenges and ensuring data are fit for integration and are properly utilized. Objectives and ApproachIn 2019, a connected data quality project was initiated to improve the usability of CIHI’s analytical data. Information quality framework concepts were applied across CIHI data sources to better understand data linkage challenges, measure inconsistencies across data sources, identify opportunities to improve data and standards, and develop resources to support users. ResultsFindings from the project identified key connected data quality activities for the organization to operationalize. These focus on quality assessment and reporting; harmonization of data standards; expanded documentation and analytical resources; data classification and profiling tools to support descriptive analysis; and new source of truth and pre-linked datasets. Quality activities were prioritized based on need and complexity, and “connected data teams” were established to carry out the work. Conclusion / ImplicationsExpansion of CIHI’s quality framework across data sources facilitates its data linkage capabilities and “connected data” use. It enables the evolution of CIHI’s analytical environments and information products from being database specific to integrated-data driven, and facilitates the use of CIHI’s analytical data for research.


2021 ◽  
pp. 55-60
Author(s):  
Christoph Schröer ◽  
Jonas Frischkorn

Sign in / Sign up

Export Citation Format

Share Document