Federated Trusted Third Party as an Approach for Privacy Preserving Record Linkage in a Large Network of University Medicines in Pandemic Research

Abstract BackgroundThe Federal Ministry of Research and Education funded the Network of University Medicine for establishing an infrastructure for pandemic research. This includes the development of a COVID-19 Data Exchange Platform (CODEX) that provides standardised and harmonised data sets for COVID-19 research. Nearly all university hospitals in Germany are part of the project and transmit medical data from the local data integration centres to the CODEX platform. The medical data on a person that has been collected at several sites is to be made available on the CODEX platform in a merged form. To enable this, a federated trusted third party (fTTP) will be established, which will allow the pseudonymised merging of the medical data. The fTTP implements privacy preserving record linkage based on Bloom filters and assigns pseudonyms to enable re-pseudonymisation during data transfer to the CODEX platform.ResultsThe fTTP was implemented conceptually and technically. For this purpose, the processes that are necessary for data delivery were modelled. The resulting communication relationships were identified and corresponding interfaces were specified. These were developed according to the specifications in FHIR and validated with the help of external partners. Existing tools such as the identity management system E-PIX® were further developed accordingly so that sites can generate Bloom filters based on person identifying information. An extension for the comparison of Bloom filters was implemented for the federated trust third party. The correct implementation was shown in the form of a demonstrator and the connection of two data integration centres.ConclusionsThis article describes how the fTTP was modelled and implemented. In a first expansion stage, the fTTP was exemplarily connected through two sites and its functionality was demonstrated. Further expansion stages, which are already planned, have been technically specified and will be implemented in the future in order to also handle cases in which the privacy preserving record linkage achieves ambiguous results. The first expansion stage of the fTTP is available in the University Medicine network and will be connected by all participating sites in the ongoing test phase.

Download Full-text

Mainzelliste SecureEpiLinker (MainSEL): Privacy-Preserving Record Linkage using Secure Multi-Party Computation

Bioinformatics ◽

10.1093/bioinformatics/btaa764 ◽

2020 ◽

Author(s):

Sebastian Stammler ◽

Tobias Kussel ◽

Phillipp Schoppmann ◽

Florian Stampe ◽

Galina Tremper ◽

...

Keyword(s):

Record Linkage ◽

Fault Tolerant ◽

Source Code ◽

Privacy Preserving ◽

Third Party ◽

Data Sets ◽

Real World Data ◽

Trusted Third Party ◽

Record Keeping ◽

Network Connection

Abstract Motivation Record Linkage has versatile applications in real-world data analysis contexts, where several data sets need to be linked on the record level in the absence of any exact identifier connecting related records. An example are medical databases of patients, spread across institutions, that have to be linked on personally identifiable entries like name, date of birth or ZIP code. At the same time, privacy laws may prohibit the exchange of this personally identifiable information (PII) across institutional boundaries, ruling out the outsourcing of the record linkage task to a trusted third party. We propose to employ privacy-preserving record linkage (PPRL) techniques that prevent, to various degrees, the leakage of PII while still allowing for the linkage of related records. Results We develop a framework for fault-tolerant PPRL using secure multi-party computation with the medical record keeping software Mainzelliste as the data source. Our solution does not rely on any trusted third party and all PII is guaranteed to not leak under common cryptographic security assumptions. Benchmarks show the feasibility of our approach in realistic networking settings: linkage of a patient record against a database of 10.000 records can be done in 48s over a heavily delayed (100ms) network connection, or 3.9s with a low-latency connection. Availability and implementation The source code of the sMPC node is freely available on Github at https://github.com/medicalinformatics/SecureEpilinker subject to the AGPLv3 license. The source code of the modified Mainzelliste is available at https://github.com/medicalinformatics/MainzellisteSEL.

Download Full-text

Optimization of the Mainzelliste software for fast privacy-preserving record linkage

Journal of Translational Medicine ◽

10.1186/s12967-020-02678-1 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Florens Rohde ◽

Martin Franke ◽

Ziad Sehili ◽

Martin Lablans ◽

Erhard Rahm

Keyword(s):

Record Linkage ◽

Comprehensive Evaluation ◽

Personal Data ◽

Privacy Preserving ◽

Use Cases ◽

Locality Sensitive Hashing ◽

Third Party ◽

Bloom Filters ◽

Linkage Quality ◽

Order Of Magnitude

Abstract Background Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases. Methods We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage. Results The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality. Conclusion We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.

Download Full-text

Designing and piloting a generic research architecture and workflows to unlock German primary care data for secondary use

Journal of Translational Medicine ◽

10.1186/s12967-020-02547-x ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Thomas Bahls ◽

Johannes Pung ◽

Stephanie Heinemann ◽

Johannes Hauswaldt ◽

Iris Demmer ◽

...

Keyword(s):

Primary Care ◽

Data Integration ◽

Record Linkage ◽

Patient Reported Outcomes ◽

Medical Data ◽

Third Party ◽

Family Doctors ◽

Analysis Phase ◽

Patient Reported ◽

Research Architecture

Abstract Background Medical data from family doctors are of great importance to health care researchers but seem to be locked in German practices and, thus, are underused in research. The RADAR project (Routine Anonymized Data for Advanced Health Services Research) aims at designing, implementing and piloting a generic research architecture, technical software solutions as well as procedures and workflows to unlock data from family doctor’s practices. A long-term medical data repository for research taking legal requirements into account is established. Thereby, RADAR helps closing the gap between the European countries and to contribute data from primary care in Germany. Methods The RADAR project comprises three phases: (1) analysis phase, (2) design phase, and (3) pilot. First, interdisciplinary workshops were held to list prerequisites and requirements. Second, an architecture diagram with building blocks and functions, and an ordered list of process steps (workflow) for data capture and storage were designed. Third, technical components and workflows were piloted. The pilot was extended by a data integration workflow using patient-reported outcomes (paper-based questionnaires). Results The analysis phase resulted in listing 17 essential prerequisites and guiding requirements for data management compliant with the General Data Protection Regulation (GDPR). Based on this list existing approaches to fulfil the RADAR tasks were evaluated—for example, re-using BDT interface for data exchange and Trusted Third Party-approach for consent management and record linkage. Consented data sets of 100 patients were successfully exported, separated into person-identifying and medical data, pseudonymised and saved. Record linkage and data integration workflows for patient-reported outcomes in the RADAR research database were successfully piloted for 63 responders. Conclusion The RADAR project successfully developed a generic architecture together with a technical framework of tools, interfaces, and workflows for a complete infrastructure for practicable and secure processing of patient data from family doctors. All technical components and workflows can be reused for further research projects. Additionally, a Trusted Third Party-approach can be used as core element to implement data privacy protection in such heterogeneous family doctor’s settings. Optimisations identified comprise a fully-electronic consent recording using tablet computers, which is part of the project’s extension phase.

Download Full-text

A Federated Record Linkage Algorithm for Secure Medical Data Sharing

German Medical Data Sciences: Bringing Data to Life - Studies in Health Technology and Informatics ◽

10.3233/shti210062 ◽

2021 ◽

Author(s):

Christian M. Heidt ◽

Hauke Hund ◽

Christian Fegeler

Keyword(s):

Data Sharing ◽

Record Linkage ◽

Medical Records ◽

Data Privacy ◽

Medical Data ◽

Third Party ◽

Bloom Filters ◽

Data Set ◽

Optimal Weights ◽

Linkage Algorithm

The process of consolidating medical records from multiple institutions into one data set makes privacy-preserving record linkage (PPRL) a necessity. Most PPRL approaches, however, are only designed to link records from two institutions, and existing multi-party approaches tend to discard non-matching records, leading to incomplete result sets. In this paper, we propose a new algorithm for federated record linkage between multiple parties by a trusted third party using record-level bloom filters to preserve patient data privacy. We conduct a study to find optimal weights for linkage-relevant data fields and are able to achieve 99.5% linkage accuracy testing on the Febrl record linkage dataset. This approach is integrated into an end-to-end pseudonymization framework for medical data sharing.

Download Full-text

Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party

2018 16th Annual Conference on Privacy, Security and Trust (PST) ◽

10.1109/pst.2018.8514192 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ibrahim Lazrig ◽

Toan C. Ong ◽

Indrajit Ray ◽

Indrakshi Ray ◽

Xiaoqian Jiang ◽

...

Keyword(s):

Record Linkage ◽

Privacy Preserving ◽

Third Party ◽

Trusted Third Party ◽

Probabilistic Record Linkage

Download Full-text

Securing Bloom Filters for Privacy-preserving Record Linkage

Proceedings of the 29th ACM International Conference on Information & Knowledge Management ◽

10.1145/3340531.3412105 ◽

2020 ◽

Author(s):

Thilina Ranbaduge ◽

Rainer Schnell

Keyword(s):

Record Linkage ◽

Privacy Preserving ◽

Bloom Filters

Download Full-text

Encoding Hierarchical Classification Codes for Privacy-Preserving Record Linkage Using Bloom Filters

Machine Learning and Knowledge Discovery in Databases - Communications in Computer and Information Science ◽

10.1007/978-3-030-43887-6_12 ◽

2020 ◽

pp. 142-156

Author(s):

Rainer Schnell ◽

Christian Borgs

Keyword(s):

Record Linkage ◽

Hierarchical Classification ◽

Privacy Preserving ◽

Bloom Filters

Download Full-text

Efficient Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-319-57454-7_49 ◽

2017 ◽

pp. 628-640 ◽

Cited By ~ 11

Author(s):

Peter Christen ◽

Rainer Schnell ◽

Dinusha Vatsalan ◽

Thilina Ranbaduge

Keyword(s):

Record Linkage ◽

Privacy Preserving ◽

Bloom Filters

Download Full-text

Secure Channel for Sharing Datasets by using Privacy-Preserving Integration Method

Journal of Computer Based Parallel Programming ◽

10.46610/jocpp.2021.v06i02.006 ◽

2021 ◽

Vol 6 (2) ◽

Author(s):

G Sriman Narayana ◽

Kuruva Arjun Kumar

Keyword(s):

Data Integration ◽

Integration Method ◽

Privacy Preserving ◽

The Other ◽

Third Party ◽

Traffic Data ◽

Structure Preserving ◽

Efficient Approach ◽

The Third ◽

Secure Channel

In privacy-enhancing technology, it has been inevitably challenging to strike a maintain balance between privacy, efficiency and usability (utility). We propose a highly practical and efficient approach for privacy-preserving integration and sharing of datasets among a group of participants. At the heart of our solution is a new interactive protocol, Secure Channel. Through Secure Channel, each participant is able to randomize their datasets via an independent and untrusted third party, such that the resulting dataset can be merged with other randomized datasets contributed by other participants group in a privacy-preserving manner. Our process does not require any public or key sharing between participants in order to integrate different datasets. This, in turn, leads to a user can understand and use easily and scalable solution. Moreover, the accuracy of a randomized dataset which are returned by the third party can be securely verified by the other participant of group. We further demonstrate Secure Channel’s general utilities, using it to construct a structure preserving data integration protocol. This is mainly useful for, good quality integration of network traffic data.

Download Full-text

Secure Privacy Preserving Record Linkage of Large Databases by Modified Bloom Filter Encodings

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.29 ◽

2017 ◽

Vol 1 (1) ◽

Cited By ~ 2

Author(s):

Rainer Schnell ◽

Christian Borgs

Keyword(s):

Record Linkage ◽

Large Scale ◽

Bloom Filter ◽

Privacy Preserving ◽

Error Rates ◽

Bloom Filters ◽

Data Sets ◽

Research Subjects ◽

Practical Applications ◽

Large Databases

ABSTRACTObjectiveIn most European settings, record linkage across different institutions has to be based on personal identifiers such as names, birthday or place of birth. To protect the privacy of research subjects, the identifiers have to be encrypted. In practice, these identifiers show error rates up to 20% per identifier, therefore linking on encrypted identifiers usually implies the loss of large subsets of the databases. In many applications, this loss of cases is related to variables of interest for the subject matter of the study. Therefore, this kind of record-linkage will generate biased estimates. These problems gave rise to techniques of Privacy Preserving Record Linkage (PPRL). Many different PPRL techniques have been suggested within the last 10 years, very few of them are suitable for practical applications with large database containing millions of records as they are typical for administrative or medical databases. One proven technique for PPRL for large scale applications is PPRL based on Bloom filters.MethodUsing appropriate parameter settings, Bloom filter approaches show linkage results comparable to linkage based on unencrypted identifiers. Furthermore, this approach has been used in real-world settings with data sets containing up to 100 Million records. By the application of suitable blocking strategies, linking can be done in reasonable time.ResultHowever, Bloom filters have been subject of cryptographic attacks. Previous research has shown that the straight application of Bloom filters has a nonzero re-identification risk. We will present new results on recently developed techniques to defy all known attacks on PPRL Bloom filters. These computationally simple algorithms modify the identifiers by different cryptographic diffusion techniques. The presentation will demonstrate these new algorithms and show their performance concerning precision, recall and re-identification risk on large databases.

Download Full-text