probabilistic record linkage
Recently Published Documents


TOTAL DOCUMENTS

69
(FIVE YEARS 13)

H-INDEX

15
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Julia Bohlius ◽  
Lina Bartels ◽  
Frédérique Chammartin ◽  
Victor Olago ◽  
Adrian Spoerri ◽  
...  

Background: Privacy-preserving probabilistic record linkage (PPPRL) methods were developed and applied in high-income countries to link records within and between organizations under strict privacy protections. PPPRL has not yet been used in African settings.Methods: We used HIV-related laboratory records from National Health Laboratory Services (NHLS) in South Africa to construct a cohort of HIV-positive patients and link them to the National Cancer Registry (NCR) with PPPRL. The study was restricted to Gauteng province from 2004 to 2014. We used records with national IDs (gold standard) to determine precision, recall, and f-measure of the linkages. We included all patients with ≥ 2 HIV-related lab records measured in the cohort and assessed the number of cancers diagnosed in people living with HIV (PLWH).Results: We included 11,480,118 HIV-related laboratory records and 664,869 cancer records in the linkage. We included 1,173,908 persons in the HIV cohort; 66.6% were female and median age at first HIV-related lab test was 33.9 years (IQR 27.4-41.3). Of the patients in the cohort, 26,348 were diagnosed with at least one cancer and 8,329 of these cancers were diagnosed before or on the date of the patient’s first HIV-related record; 18,019 were diagnosed after their first HIV-related record. For all linkages, precision, recall, and f-measures were high.Conclusion: Our study showed it is feasible to use PPPRL in an African setting to link routinely collected health records from different data sources and create a longitudinal HIV cohort with cancer outcomes while strictly protecting patient privacy. This work served as the foundation to create a nationwide population-based cohort including all South African provinces which will be used to inform cancer control programs.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Holly Tibble ◽  
◽  
James Lay-Flurrie ◽  
Aziz Sheikh ◽  
Rob Horne ◽  
...  

Abstract Background Records of medication prescriptions can be used in conjunction with pharmacy dispensing records to investigate the incidence of adherence, which is defined as observing the treatment plans agreed between a patient and their clinician. Using prescribing records alone fails to identify primary non-adherence; medications not being collected from the dispensary. Using dispensing records alone means that cases of conditions that resolve and/or treatments that are discontinued will be unaccounted for. While using a linked prescribing and dispensing dataset to measure medication non-adherence is optimal, this linkage is not routinely conducted. Furthermore, without a unique common event identifier, linkage between these two datasets is not straightforward. Methods We undertook a secondary analysis of the Salford Lung Study dataset. A novel probabilistic record linkage methodology was developed matching asthma medication pharmacy dispensing records and primary care prescribing records, using semantic (meaning) and syntactic (structure) harmonization, domain knowledge integration, and natural language feature extraction. Cox survival analysis was conducted to assess factors associated with the time to medication dispensing after the prescription was written. Finally, we used a simplified record linkage algorithm in which only identical records were matched, for a naïve benchmarking to compare against the results of our proposed methodology. Results We matched 83% of pharmacy dispensing records to primary care prescribing records. Missing data were prevalent in the dispensing records which were not matched – approximately 60% for both medication strength and quantity. A naïve benchmarking approach, requiring perfect matching, identified one-quarter as many matching prescribing records as our methodology. Factors associated with delay (or failure) to collect the prescribed medication from a pharmacy included season, quantity of medication prescribed, previous dispensing history and class of medication. Our findings indicate that over 30% of prescriptions issued were not collected from a dispensary (primary non-adherence). Conclusions We have developed a probabilistic record linkage methodology matching a large percentage of pharmacy dispensing records with primary care prescribing records for asthma medications. This will allow researchers to link datasets in order to extract information about asthma medication non-adherence.


Author(s):  
Jana Asher ◽  
Dean Resnick ◽  
Jennifer Brite ◽  
Robert Brackbill ◽  
James Cone

Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter-based methods, machine learning methods, and Bayesian methods. Practical considerations, such as data standardization and privacy concerns, are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters, such as 9/11.


Author(s):  
Jana Asher ◽  
Dean Resnick ◽  
Jennifer Brite ◽  
Robert Brackbill ◽  
James Cone

Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter based methods, machine learning methods, and Bayesian methods. Practical considerations such as data standardization and privacy concerns are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters such as 9/11.


Author(s):  
Yinghao Zhang ◽  
Senlin Xu ◽  
Mingfan Zheng ◽  
Xinran Li

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.


2019 ◽  
Vol 82 (S 02) ◽  
pp. S131-S138
Author(s):  
Sebastian Bartholomäus ◽  
Yannik Siegert ◽  
Hans Werner Hense ◽  
Oliver Heidinger

Abstract Background The evaluation of population-based screening programs, like the German Mammography Screening Program (MSP), requires collection and linking data from population-based cancer registries and other sources of the healthcare system on a case- specific level. To link such sensitive data, we developed a method that is compliant with German data protection regulations and does not require written individual consent. Methods Our method combines a probabilistic record linkage on encrypted identifying data with ‘blinded anonymisation’. It ensures that all data either are encrypted or have a defined and measurable degree of anonymity. The data sources use a software to transform plain-text identifying data into a set of irreversibly encrypted person cryptograms, while the evaluation attributes are aggregated in multiple stages and are reversibly encrypted. A pseudonymisation service encrypts the person cryptograms into record assignment numbers and a downstream data-collecting centre uses them to perform the probabilistic record linkage. The blinded anonymisation solves the problem of quasi-identifiers within the evaluation data. It allows selecting a specific set of the encrypted aggregations to produce data export with ensured k-anonymity, without any plain-text information. These data are finally transferred to an evaluation centre where they are decrypted and analysed. Our approach allows creating several such generalisations, with different resulting suppression rates allowing dynamic balance information depth with privacy protection and also highlights how this affects data analysability. Results German data protection authorities approved our concept for the evaluation of the impact of the German MSP on breast cancer mortality. We implemented a prototype and tested it with 1.5 million simulated records, containing realistically distributed identifying data, calculated different generalisations and the respective suppression rates. Here, we also discuss limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like l-diversity and how to reduce the amount of manual post-processing. Conclusion Our approach enables secure linking of data from population-based cancer registries and other sources of the healthcare system. Despite some limitations, it enables evaluation of the German MSP program and can be generalised to be applicable to other projects.


Sign in / Sign up

Export Citation Format

Share Document