Using Record Linkage to Conform the Patient Dimension of an Antiretroviral Therapy Data Warehouse

Author(s):  
E. Kotzé ◽  
T. McDonald
Author(s):  
William E. Winkler

Fayyad and Uthursamy (2002) have stated that the majority of the work (representing months or years) in creating a data warehouse is in cleaning up duplicates and resolving other anomalies. This paper provides an overview of two methods for improving quality. The first is record linkage for finding duplicates within files or across files. The second is edit/imputation for maintaining business rules and for filling-in missing data. The fastest record linkage methods are suitable for files with hundreds of millions of records (Winkler, 2004a, 2008). The fastest edit/imputation methods are suitable for files with millions of records (Winkler, 2004b, 2007a).


2021 ◽  
Author(s):  
Aurélie Bannay ◽  
Mathilde Bories ◽  
Pascal Le Corre ◽  
Christine Riou ◽  
Pierre Lemordant ◽  
...  

BACKGROUND Linking different sources of medical data is a promising approach to analyse care trajectories. The INSHARE project aim was to provide the blueprint of a technological platform that facilitates integration, sharing and reuse of data from two sources: the eHOP clinical data warehouse (CDW) of Rennes academic hospital, and a dataset extracted from the French national claim data warehouse (SNDS). OBJECTIVE Using a pharmacovigilance use case based on statin consumption and statin-drug interactions, the present work demonstrates how the INSHARE platform can support big data analytical tasks in the health field. METHODS A Spark distributed cluster-computing framework was used for the record linkage procedure and all the analyses. A semi-deterministic record-linkage method based on the variables common between the chosen data sources was developed to identify all patients discharged after at least one hospital stay at Rennes academic hospital between 2015 and 2017. The use case study focused on a cohort of patients treated with statins prescribed by their general practitioner and/or during their hospital stay. RESULTS The whole process (record-linkage procedure and use case analyses) required 88 minutes. Among the 161,532 and 164,316 patients from the SNDS dataset and eHOP CDW, respectively, 159,495 patients were successfully linked (98.7% and 97.0% of patients from SNDS and eHOP CDW, respectively). Among the 16,806 patients with at least one statin delivery, 8,293 patients started the consumption before and continued during the hospital stay, 6,382 patients stopped statin consumption at hospital admission, and 2,131 patients initiated taking statins in hospital. Statin-drug interactions occurred more frequently during hospitalization than in the community (36.4% and 22.2%, respectively). Only 121 patients had the most severe level of statin-drug interaction. Hospital stay burden (length of stay and in-hospital mortality) was more severe in patients with statin-drug interactions during hospitalization. CONCLUSIONS This study demonstrates the added value of combining and re-using clinical and claim data to provide large-scale measures of drug-drug interaction prevalence and care pathways outside hospitals. It builds the path to move the current healthcare system towards a Learning Health System using knowledge generated from research on real-world health data.


2005 ◽  
Author(s):  
Pascal Niamba ◽  
Souleymane A. G. Aboubacrine ◽  
Catherine Boileau ◽  
Maria-Victoria Zunzunegui ◽  
Vknh Kim Nguyen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document