scholarly journals Improving a Secondary Use Health Data Warehouse: Proposing a Multi-Level Data Quality Framework

Author(s):  
Sandra Henley-Smith ◽  
Douglas Boyle ◽  
Kathleen Gray
Author(s):  
Larry Svenson

BackgroundThe Province of Alberta, Canada, maintains a mature data environment with linkable administrative and clinical data dating back up to 30 years. Alberta has a single payer, publicly funded and administered, universal health system, which maintains multiple administrative data sets. Main AimThe main aim of the strategy is to fully maximize the data assets in the province to drive health system health system innovation, with a focus on improving health outcomes and quality of life. Methods/ApproachThe Alberta Ministry of Health has created the Secondary Use Data Access (SUDA) initiative to leverage its administrative health data. SUDA envisions strengthening partnerships between the public and private sectors through two main data access approaches. The first is direct access to de-identified data held within the Alberta Health data warehouse by key health system stakeholders (e.g. academic institutions, professional associations, regulatory colleges). The second is indirect access to private and not-for-profit organizations, using a data access safe haven (DASH) approach. Indirect access is achieved through private sector investments to a trusted third party that hires analysts placed within the Ministry of Health offices. ResultsStaffing agreements and privacy impact assessments are in place. Indirect access includes a multiple stakeholder steering committee to vet and prioritize projects. Private and not-for-profit stakeholders do not have access to raw data, but rather receive access to aggregated data and statistical models. All data disclosures are done by Ministry staff to ensure compliance with Alberta's Health Information Act. Direct access has been established for one professional organization and one academic institution, with access restricted to de-identified data. ConclusionThe Secondary Use Data Access initiative uses a safe haven approach to leveraging data to provide a more secure approach to data access. It reduces the need to provision data outside of the data warehouse while improving timely access to data. The approach provides assurances that people's health information is held secure, while also being used to create health system improvements.


Author(s):  
Jeffrey Brown

IntroductionSeveral large health data networks such as FDA Sentinel, PCORnet, and the Canadian Network of Observational Drug Effect Studies (CNODES) facilitate multi-site research using real-world electronic health data such administrative claims data, electronic health record data and registries. Experience in operation of mutliple health data networks will described. Objectives and ApproachOver the past 15 years substantial progress has been made in developing the optimal network operational design, governance, and technical architecture to facilitate the creation and operation of large-scale distributed health data networks. The design, architecture, and operation of a sustainable health data network requires balancing the needs of the network stakeholders such as funders, data sources, investigators, and regulatory bodies while enabling rapid and efficient use of data to support evidence generation and decision making. Important topics include protection of patient privacy, security, data autonomy, distributed analytics, data quality, and protection of confidential information. ResultsThe design and architecture of existing distributed health data networks provides guidance regarding the potential operational model for new networks and identifies areas of research to improve network functionality and capabilities. Most health data network adopt a common data model approach to facilitate multi-site querying and data quality assessment. This approach is coupled with distributed querying in which data partners maintain physical and operational control of their data. This design maximizes protection of confidential and proprietary information and minimizes the need to share patient-level data. Privacy-preserving distributed regression approaches and methods that obviate the need to share person-level data while generating robust results help to ensure network participation. Strong security and governance structures are also necessary for effective operation of a distributed network. Conclusion/ImplicationsDistributed health data networks offer the opportunity to use real-world data for public health surveillance and comparative safety and effectiveness research across large populations. The operational design, technical and analytic architecture, and governance models of networks drive their acceptance and success.


Author(s):  
Larry Svenson ◽  
Kimberley Simmonds ◽  
Alexa Perry ◽  
Justin Riemer

IntroductionThe Province of Alberta maintains a mature data ecosystem with linkable data dating back over 30 years. The population-based nature of the data makes this a valuable asset for driving analytics to support health system innovation, with a focus on improving health outcomes and quality of life. Objectives and ApproachAlberta Health has created the Secondary Use Data Access (SUDA) initiative to leverage its administrative health data. SUDA envisions strengthening partnerships between the public and private sectors with two main access approaches. The first is direct access to de-identified data held within the Alberta Health data warehouse by key health system stakeholders (e.g. academic instituions, Health Quality Council of Alberta, regulatory colleges). The second is indirect access to private and not-for-profit stakeholders, using a safe haven approach. Indirect access is achieved through private sector investments to a trusted third party that hires analysts to be placed within Alberta Health. ResultsStaffing agreements and privacy impact assessments have been drafted to support the work. The indirect access route includes a multiple stakeholder steering committee to vette and prioritize projects. Private and not-for-profit stakeholders do not have access to the data, but rather receive access to aggregate data and statitstical models. All disclosures are done by Alberta Health staff to ensure compliance with Alberta's Health Information Act. Direct access has been established for the Alberta Medical Association as part of a long standing data sharing agreement, with access restricted to de-identified data only. To date, seven industry proposals for analytics have been received and are currently being actioned. Conclusion/ImplicationsThe Secondary Use Data Access initiative uses a safe haven approach to leveraging data. It reduces the need to provision data outside of the data warehouse and allows for better monitoring of access and use of data. The approach provides assurances that people's health information is secure.


2015 ◽  
Vol 7 (3) ◽  
Author(s):  
Julia Eaton ◽  
Ian Painter ◽  
Donald Olson ◽  
William Lober

Secondary use of clinical health data for near real-time public health surveillance presents challenges surrounding its utility due to data quality issues. Data used for real-time surveillance must be timely, accurate and complete if it is to be useful; if incomplete data are used for surveillance, understanding the structure of the incompleteness is necessary. Such data are commonly aggregated due to privacy concerns. The Distribute project was a near real-time influenza-like-illness (ILI) surveillance system that relied on aggregated secondary clinical health data. The goal of this work is to disseminate the data quality tools developed to gain insight into the data quality problems associated with these data. These tools apply in general to any system where aggregate data are accrued over time and were created through the end-user-as-developer paradigm. Each tool was developed during the exploratory analysis to gain insight into structural aspects of data quality. Our key finding is that data quality of partially accruing data must be studied in the context of accrual lag—the difference between the time an event occurs and the time data for that event are received, i.e. the time at which data become available to the surveillance system. Our visualization methods therefore revolve around visualizing dimensions of data quality affected by accrual lag, in particular the tradeoff between timeliness and completion, and the effects of accrual lag on accuracy.  Accounting for accrual lag in partially accruing data is necessary to avoid misleading or biased conclusions about trends in indicator values and data quality. 


Author(s):  
Annabelle Cumyn ◽  
Roxanne Dault ◽  
Adrien Barton ◽  
Anne-Marie Cloutier ◽  
Jean-François Ethier

A survey was conducted to assess citizens, research ethics committee members, and researchers’ attitude toward information and consent for the secondary use of health data for research within learning health systems (LHSs). Results show that the reuse of health data for research to advance knowledge and improve care is valued by all parties; consent regarding health data reuse for research has fundamental importance particularly to citizens; and all respondents deemed important the existence of a secure website to support the information and consent processes. This survey was part of a larger project that aims at exploring public perspectives on alternate approaches to the current consent models for health data reuse to take into consideration the unique features of LHSs. The revised model will need to ensure that citizens are given the opportunity to be better informed about upcoming research and have their say, when possible, in the use of their data.


2014 ◽  
Vol 668-669 ◽  
pp. 1374-1377 ◽  
Author(s):  
Wei Jun Wen

ETL refers to the process of data extracting, transformation and loading and is deemed as a critical step in ensuring the quality, data specification and standardization of marine environmental data. Marine data, due to their complication, field diversity and huge volume, still remain decentralized, polyphyletic and isomerous with different semantics and hence far from being able to provide effective data sources for decision making. ETL enables the construction of marine environmental data warehouse in the form of cleaning, transformation, integration, loading and periodic updating of basic marine data warehouse. The paper presents a research on rules for cleaning, transformation and integration of marine data, based on which original ETL system of marine environmental data warehouse is so designed and developed. The system further guarantees data quality and correctness in analysis and decision-making based on marine environmental data in the future.


Sign in / Sign up

Export Citation Format

Share Document