Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party

Author(s):  
Ibrahim Lazrig ◽  
Toan C. Ong ◽  
Indrajit Ray ◽  
Indrakshi Ray ◽  
Xiaoqian Jiang ◽  
...  
Author(s):  
Sebastian Stammler ◽  
Tobias Kussel ◽  
Phillipp Schoppmann ◽  
Florian Stampe ◽  
Galina Tremper ◽  
...  

Abstract Motivation Record Linkage has versatile applications in real-world data analysis contexts, where several data sets need to be linked on the record level in the absence of any exact identifier connecting related records. An example are medical databases of patients, spread across institutions, that have to be linked on personally identifiable entries like name, date of birth or ZIP code. At the same time, privacy laws may prohibit the exchange of this personally identifiable information (PII) across institutional boundaries, ruling out the outsourcing of the record linkage task to a trusted third party. We propose to employ privacy-preserving record linkage (PPRL) techniques that prevent, to various degrees, the leakage of PII while still allowing for the linkage of related records. Results We develop a framework for fault-tolerant PPRL using secure multi-party computation with the medical record keeping software Mainzelliste as the data source. Our solution does not rely on any trusted third party and all PII is guaranteed to not leak under common cryptographic security assumptions. Benchmarks show the feasibility of our approach in realistic networking settings: linkage of a patient record against a database of 10.000 records can be done in 48s over a heavily delayed (100ms) network connection, or 3.9s with a low-latency connection. Availability and implementation The source code of the sMPC node is freely available on Github at https://github.com/medicalinformatics/SecureEpilinker subject to the AGPLv3 license. The source code of the modified Mainzelliste is available at https://github.com/medicalinformatics/MainzellisteSEL.


2021 ◽  
Author(s):  
Christopher Hampf ◽  
Martin Bialke ◽  
Hauke Hund ◽  
Christian Fegeler ◽  
Stefan Lang ◽  
...  

Abstract BackgroundThe Federal Ministry of Research and Education funded the Network of University Medicine for establishing an infrastructure for pandemic research. This includes the development of a COVID-19 Data Exchange Platform (CODEX) that provides standardised and harmonised data sets for COVID-19 research. Nearly all university hospitals in Germany are part of the project and transmit medical data from the local data integration centres to the CODEX platform. The medical data on a person that has been collected at several sites is to be made available on the CODEX platform in a merged form. To enable this, a federated trusted third party (fTTP) will be established, which will allow the pseudonymised merging of the medical data. The fTTP implements privacy preserving record linkage based on Bloom filters and assigns pseudonyms to enable re-pseudonymisation during data transfer to the CODEX platform.ResultsThe fTTP was implemented conceptually and technically. For this purpose, the processes that are necessary for data delivery were modelled. The resulting communication relationships were identified and corresponding interfaces were specified. These were developed according to the specifications in FHIR and validated with the help of external partners. Existing tools such as the identity management system E-PIX® were further developed accordingly so that sites can generate Bloom filters based on person identifying information. An extension for the comparison of Bloom filters was implemented for the federated trust third party. The correct implementation was shown in the form of a demonstrator and the connection of two data integration centres.ConclusionsThis article describes how the fTTP was modelled and implemented. In a first expansion stage, the fTTP was exemplarily connected through two sites and its functionality was demonstrated. Further expansion stages, which are already planned, have been technically specified and will be implemented in the future in order to also handle cases in which the privacy preserving record linkage achieves ambiguous results. The first expansion stage of the fTTP is available in the University Medicine network and will be connected by all participating sites in the ongoing test phase.


2021 ◽  
Author(s):  
Julia Bohlius ◽  
Lina Bartels ◽  
Frédérique Chammartin ◽  
Victor Olago ◽  
Adrian Spoerri ◽  
...  

Background: Privacy-preserving probabilistic record linkage (PPPRL) methods were developed and applied in high-income countries to link records within and between organizations under strict privacy protections. PPPRL has not yet been used in African settings.Methods: We used HIV-related laboratory records from National Health Laboratory Services (NHLS) in South Africa to construct a cohort of HIV-positive patients and link them to the National Cancer Registry (NCR) with PPPRL. The study was restricted to Gauteng province from 2004 to 2014. We used records with national IDs (gold standard) to determine precision, recall, and f-measure of the linkages. We included all patients with ≥ 2 HIV-related lab records measured in the cohort and assessed the number of cancers diagnosed in people living with HIV (PLWH).Results: We included 11,480,118 HIV-related laboratory records and 664,869 cancer records in the linkage. We included 1,173,908 persons in the HIV cohort; 66.6% were female and median age at first HIV-related lab test was 33.9 years (IQR 27.4-41.3). Of the patients in the cohort, 26,348 were diagnosed with at least one cancer and 8,329 of these cancers were diagnosed before or on the date of the patient’s first HIV-related record; 18,019 were diagnosed after their first HIV-related record. For all linkages, precision, recall, and f-measures were high.Conclusion: Our study showed it is feasible to use PPPRL in an African setting to link routinely collected health records from different data sources and create a longitudinal HIV cohort with cancer outcomes while strictly protecting patient privacy. This work served as the foundation to create a nationwide population-based cohort including all South African provinces which will be used to inform cancer control programs.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Florens Rohde ◽  
Martin Franke ◽  
Ziad Sehili ◽  
Martin Lablans ◽  
Erhard Rahm

Abstract Background Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases. Methods We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage. Results The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality. Conclusion We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.


Sign in / Sign up

Export Citation Format

Share Document