Designing and Implementing a Privacy Preserving Record Linkage Protocol

IntroductionThe Ontario Brain Institute has developed Brain-CODE, an informatics platform, to support the acquisition, storage, management and analysis of multi-modal data. The standardized research data within Brain-CODE spans several brain disorders, allowing for integrative analyses, while also providing the opportunity to leverage existing clinical administrative data holdings through external linkages. Objectives and ApproachWithin Ontario, the majority of individuals who access the healthcare system have a unique identifier, the Ontario Health Insurance Plan (OHIP) number. The OHIP number can facilitate linkages with administrative data holdings, such as those at the Institute for Clinical Evaluative Sciences (ICES). Given that OBI is not permitted under Ontario’s privacy legislation to hold OHIP numbers, identifiers for consented participants are encrypted using a public key mechanism upon entry into Brain-CODE, where the private key is inaccessible. To facilitate linkages involving OHIP numbers between Brain-CODE and ICES, Brain-CODE Link software was co-developed by members of the Indoc Consortium. ResultsBrain-CODE Link allows a deterministic linkage between encrypted identifiers (OHIP numbers), without revealing participant identity. The same homomorphic encryption algorithm applied to identifiers upon entry to Brain-CODE, is applied to relevant identifiers within ICES data holdings. Encrypted identifiers from Brain-CODE are securely transferred to ICES, where a comparison computation calculates differences between the encrypted sets. These differences are sent to a semi-trusted third party, who has no access to the original data, to decrypt the differences using the private key. A zero difference indicates a set of matching identifiers. One of the main challenges during testing and development of Brain-CODE Link was ensuring the software was capable of scaling to a population level, performing a large number of comparisons, in a computationally efficient manner. Conclusion/ImplicationsOngoing pilot projects within the areas of epilepsy, neurodevelopment disorders, and neurodegeneration will be the first examples of linkages between Brain-CODE and ICES. Brain-CODE Link has successfully performed several billion test comparisons, indicating its suitability to function as a scalable privacy preserving record linkage to support comprehensive analyses.

Download Full-text

Mainzelliste SecureEpiLinker (MainSEL): Privacy-Preserving Record Linkage using Secure Multi-Party Computation

Bioinformatics ◽

10.1093/bioinformatics/btaa764 ◽

2020 ◽

Author(s):

Sebastian Stammler ◽

Tobias Kussel ◽

Phillipp Schoppmann ◽

Florian Stampe ◽

Galina Tremper ◽

...

Keyword(s):

Record Linkage ◽

Fault Tolerant ◽

Source Code ◽

Privacy Preserving ◽

Third Party ◽

Data Sets ◽

Real World Data ◽

Trusted Third Party ◽

Record Keeping ◽

Network Connection

Abstract Motivation Record Linkage has versatile applications in real-world data analysis contexts, where several data sets need to be linked on the record level in the absence of any exact identifier connecting related records. An example are medical databases of patients, spread across institutions, that have to be linked on personally identifiable entries like name, date of birth or ZIP code. At the same time, privacy laws may prohibit the exchange of this personally identifiable information (PII) across institutional boundaries, ruling out the outsourcing of the record linkage task to a trusted third party. We propose to employ privacy-preserving record linkage (PPRL) techniques that prevent, to various degrees, the leakage of PII while still allowing for the linkage of related records. Results We develop a framework for fault-tolerant PPRL using secure multi-party computation with the medical record keeping software Mainzelliste as the data source. Our solution does not rely on any trusted third party and all PII is guaranteed to not leak under common cryptographic security assumptions. Benchmarks show the feasibility of our approach in realistic networking settings: linkage of a patient record against a database of 10.000 records can be done in 48s over a heavily delayed (100ms) network connection, or 3.9s with a low-latency connection. Availability and implementation The source code of the sMPC node is freely available on Github at https://github.com/medicalinformatics/SecureEpilinker subject to the AGPLv3 license. The source code of the modified Mainzelliste is available at https://github.com/medicalinformatics/MainzellisteSEL.

Download Full-text

Optimization of the Mainzelliste software for fast privacy-preserving record linkage

Journal of Translational Medicine ◽

10.1186/s12967-020-02678-1 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Florens Rohde ◽

Martin Franke ◽

Ziad Sehili ◽

Martin Lablans ◽

Erhard Rahm

Keyword(s):

Record Linkage ◽

Comprehensive Evaluation ◽

Personal Data ◽

Privacy Preserving ◽

Use Cases ◽

Locality Sensitive Hashing ◽

Third Party ◽

Bloom Filters ◽

Linkage Quality ◽

Order Of Magnitude

Abstract Background Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases. Methods We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage. Results The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality. Conclusion We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.

Download Full-text

Federated Trusted Third Party as an Approach for Privacy Preserving Record Linkage in a Large Network of University Medicines in Pandemic Research

10.21203/rs.3.rs-1053445/v1 ◽

2021 ◽

Author(s):

Christopher Hampf ◽

Martin Bialke ◽

Hauke Hund ◽

Christian Fegeler ◽

Stefan Lang ◽

...

Keyword(s):

Data Integration ◽

Record Linkage ◽

Privacy Preserving ◽

Medical Data ◽

Third Party ◽

Bloom Filters ◽

Large Network ◽

Trusted Third Party ◽

University Medicine ◽

Expansion Stage

Abstract BackgroundThe Federal Ministry of Research and Education funded the Network of University Medicine for establishing an infrastructure for pandemic research. This includes the development of a COVID-19 Data Exchange Platform (CODEX) that provides standardised and harmonised data sets for COVID-19 research. Nearly all university hospitals in Germany are part of the project and transmit medical data from the local data integration centres to the CODEX platform. The medical data on a person that has been collected at several sites is to be made available on the CODEX platform in a merged form. To enable this, a federated trusted third party (fTTP) will be established, which will allow the pseudonymised merging of the medical data. The fTTP implements privacy preserving record linkage based on Bloom filters and assigns pseudonyms to enable re-pseudonymisation during data transfer to the CODEX platform.ResultsThe fTTP was implemented conceptually and technically. For this purpose, the processes that are necessary for data delivery were modelled. The resulting communication relationships were identified and corresponding interfaces were specified. These were developed according to the specifications in FHIR and validated with the help of external partners. Existing tools such as the identity management system E-PIX® were further developed accordingly so that sites can generate Bloom filters based on person identifying information. An extension for the comparison of Bloom filters was implemented for the federated trust third party. The correct implementation was shown in the form of a demonstrator and the connection of two data integration centres.ConclusionsThis article describes how the fTTP was modelled and implemented. In a first expansion stage, the fTTP was exemplarily connected through two sites and its functionality was demonstrated. Further expansion stages, which are already planned, have been technically specified and will be implemented in the future in order to also handle cases in which the privacy preserving record linkage achieves ambiguous results. The first expansion stage of the fTTP is available in the University Medicine network and will be connected by all participating sites in the ongoing test phase.

Download Full-text

Practical private-key fully homomorphic encryption in rings

Groups – Complexity – Cryptology ◽

10.1515/gcc-2018-0006 ◽

2018 ◽

Vol 0 (0) ◽

Cited By ~ 1

Author(s):

Alexey Gribov ◽

Delaram Kahrobaei ◽

Vladimir Shpilrain

Keyword(s):

Homomorphic Encryption ◽

Public Information ◽

Third Party ◽

Efficient Computation ◽

Fully Homomorphic Encryption ◽

Encrypted Data ◽

Private Key

Abstract We describe a practical fully homomorphic encryption (FHE) scheme based on homomorphisms between rings and show that it enables very efficient computation on encrypted data. Our encryption though is private-key; public information is only used to operate on encrypted data without decrypting it. Still, we show that our method allows for a third party search on encrypted data.

Download Full-text

Privacy-Functionality Trade-Off: A Privacy-Preserving Multi-Channel Smart Metering System

Energies ◽

10.3390/en13123221 ◽

2020 ◽

Vol 13 (12) ◽

pp. 3221 ◽

Cited By ~ 1

Author(s):

Xiao-Yu Zhang ◽

Stefanie Kuenzel ◽

José-Rodrigo Córdoba-Pachón ◽

Chris Watkins

Keyword(s):

Data Aggregation ◽

Homomorphic Encryption ◽

Value Added ◽

Privacy Preserving ◽

Third Party ◽

Sensitive Information ◽

Central Processor ◽

Smart Metering ◽

Trade Off ◽

Level Data

While smart meters can provide households with more autonomy regarding their energy consumption, they can also be a significant intrusion into the household’s privacy. There is abundant research implementing protection methods for different aspects (e.g., noise-adding and data aggregation, data down-sampling); while the private data are protected as sensitive information is hidden, some of the compulsory functions such as Time-of-use (TOU) billing or value-added services are sacrificed. Moreover, some methods, such as rechargeable batteries and homomorphic encryption, require an expensive energy storage system or central processor with high computation ability, which is unrealistic for mass roll-out. In this paper, we propose a privacy-preserving smart metering system which is a combination of existing data aggregation and data down-sampling mechanisms. The system takes an angle based on the ethical concerns about privacy and it implements a hybrid privacy-utility trade-off strategy, without sacrificing functionality. In the proposed system, the smart meter plays the role of assistant processor rather than information sender/receiver, and it enables three communication channels to transmit different temporal resolution data to protect privacy and allow freedom of choice: high frequency feed-level/substation-level data are adopted for grid operation and management purposes, low frequency household-level data are used for billing, and a privacy-preserving valued-add service channel to provide third party (TP) services. In the end of the paper, the privacy performance is evaluated to examine whether the proposed system satisfies the privacy and functionality requirements.

Download Full-text

Privacy-preserving neural networks with Homomorphic encryption: Challenges and opportunities

Peer-to-Peer Networking and Applications ◽

10.1007/s12083-021-01076-8 ◽

2021 ◽

Author(s):

Bernardo Pulido-Gaytan ◽

Andrei Tchernykh ◽

Jorge M. Cortés-Mendoza ◽

Mikhail Babenko ◽

Gleb Radchenko ◽

...

Keyword(s):

Neural Networks ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Third Party ◽

Security Threats ◽

Computing Power ◽

Confidential Data ◽

Challenges And Opportunities ◽

Potential Applications ◽

In Transit

AbstractClassical machine learning modeling demands considerable computing power for internal calculations and training with big data in a reasonable amount of time. In recent years, clouds provide services to facilitate this process, but it introduces new security threats of data breaches. Modern encryption techniques ensure security and are considered as the best option to protect stored data and data in transit from an unauthorized third-party. However, a decryption process is necessary when the data must be processed or analyzed, falling into the initial problem of data vulnerability. Fully Homomorphic Encryption (FHE) is considered the holy grail of cryptography. It allows a non-trustworthy third-party resource to process encrypted information without disclosing confidential data. In this paper, we analyze the fundamental concepts of FHE, practical implementations, state-of-the-art approaches, limitations, advantages, disadvantages, potential applications, and development tools focusing on neural networks. In recent years, FHE development demonstrates remarkable progress. However, current literature in the homomorphic neural networks is almost exclusively addressed by practitioners looking for suitable implementations. It still lacks comprehensive and more thorough reviews. We focus on the privacy-preserving homomorphic encryption cryptosystems targeted at neural networks identifying current solutions, open issues, challenges, opportunities, and potential research directions.

Download Full-text

Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party

2018 16th Annual Conference on Privacy, Security and Trust (PST) ◽

10.1109/pst.2018.8514192 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ibrahim Lazrig ◽

Toan C. Ong ◽

Indrajit Ray ◽

Indrakshi Ray ◽

Xiaoqian Jiang ◽

...

Keyword(s):

Record Linkage ◽

Privacy Preserving ◽

Third Party ◽

Trusted Third Party ◽

Probabilistic Record Linkage

Download Full-text

Efficient Privacy-Preserving Protocol for k-NN Search over Encrypted Data in Location-Based Service

Complexity ◽

10.1155/2017/1490283 ◽

2017 ◽

Vol 2017 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Huijuan Lian ◽

Weidong Qiu ◽

Di Yan ◽

Zheng Huang ◽

Jie Guo

Keyword(s):

Spatial Data ◽

Nearest Neighbor ◽

Homomorphic Encryption ◽

High Accuracy ◽

Privacy Preserving ◽

Communication Cost ◽

Location Based Services ◽

Third Party ◽

Mobile Communication Technology ◽

Further Development

With the development of mobile communication technology, location-based services (LBS) are booming prosperously. Meanwhile privacy protection has become the main obstacle for the further development of LBS. The k-nearest neighbor (k-NN) search is one of the most common types of LBS. In this paper, we propose an efficient private circular query protocol (EPCQP) with high accuracy rate and low computation and communication cost. We adopt the Moore curve to convert two-dimensional spatial data into one-dimensional sequence and encrypt the points of interest (POIs) information with the Brakerski-Gentry-Vaikuntanathan homomorphic encryption scheme for privacy-preserving. The proposed scheme performs the secret circular shift of the encrypted POIs information to hide the location of the user without a trusted third party. To reduce the computation and communication cost, we dynamically divide the table of the POIs information according to the value of k. Experiments show that the proposed scheme provides high accuracy query results while maintaining low computation and communication cost.

Download Full-text

Scalable Secure Privacy-Preserving Record Linkage (PPRL) Methods Using Cloud-based Infrastructure

International Journal for Population Data Science ◽

10.23889/ijpds.v3i4.638 ◽

2018 ◽

Vol 3 (4) ◽

Author(s):

Toan Ong ◽

Ibrahim Lazrig ◽

Indrajit Ray ◽

Indrakshi Ray ◽

Michael Kahn

Keyword(s):

Parallel Processing ◽

Record Linkage ◽

High Capacity ◽

Privacy Preserving ◽

Secure Computation ◽

Third Party ◽

Major Drawback ◽

Garbled Circuits ◽

Chunk Size ◽

Synthetic Datasets

IntroductionBloom Filters (BFs) are a scalable solution for probabilistic privacy-preserving record linkage but BFs can be compromised. Yao’s garbled circuits (GCs) can perform secure multi-party computation to compute the similarity of two BFs without a trusted third party. The major drawback of using BFs and GCs together is poor efficiency. Objectives and ApproachWe evaluated the feasibility of BFs+GCs using high capacity compute engines and implementing a novel parallel processing framework in Google Cloud Compute Engines (GCCE). In the Yao’s two-party secure computation protocol, one party serves as the generator and the other party serves as the evaluator. To link data in parallel, records from both parties are divided into chunks. Linkage between every two chunks in the same block is processed by a thread. The number of threads for linkage depends on available computing resources. We tested the parallelized process in various scenarios with variations in hardware and software configurations. ResultsTwo synthetic datasets with 10K records were linked using BFs+GCs on 12 different software and hardware configurations which varied by: number of CPU cores (4 to 32), memory size (15GB – 28.8GB), number of threads (6-41), and chunk size (50-200 records). The minimum configuration (4 cores; 15GB memory) took 8,062.4s to complete whereas the maximum configuration (32 cores; 28.8GB memory) took 1,454.1s. Increasing the number of threads or changing the chunk size without providing more CPU cores and memory did not improve the efficiency. Efficiency is improved on average by 39.81% when the number of cores and memory on the both sides are doubled. The CPU utilization is maximized (near 100% on both sides) when the computing power of the generator is double the evaluator. Conclusion/ImplicationsThe PPRL runtime of BFs+GCs was greatly improved using parallel processing in a cloud-based infrastructure. A cluster of GCCEs could be leveraged to reduce the runtime of data linkage operations even further. Scalable cloud-based infrastructures can overcome the trade-off between security and efficiency, allowing computationally complex methods to be implemented.

Download Full-text

Overcoming the Impasse 2: Assessing the Quality of Recent Australian Applications of a Privacy-Preserving Record Linkage Method (PPRL-BLOOM)

International Journal for Population Data Science ◽

10.23889/ijpds.v5i5.1489 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

Sean Randall ◽

Adrian Brown ◽

Anna Ferrante ◽

James Boyd ◽

Katie Irvine ◽

...

Keyword(s):

Real World ◽

Child Protection ◽

Record Linkage ◽

Data Linkage ◽

Privacy Preserving ◽

Third Party ◽

Regulatory Constraints ◽

Linkage Quality ◽

Personally Identifying Information

IntroductionWhile the quantity and type of datasets used by data linkage projects is growing, there remain some datasets that are ‘not available’ or ‘hard to access’ by researchers and linkers, either due to legal/regulatory constraints restricting the release of personally identifying information or because of privacy or reputational concerns. Advances in privacy-preserving record linkage methods (e.g. PPRL-Bloom) have made it possible to overcome this impasse. These techniques aim to provide strong privacy protection while still maintaining high linkage quality. PPRL-Bloom methods are being used in practice. The Centre for Data Linkage (CDL) at Curtin University has been involved in several PPRL linkage and evaluation projects using real-world data. As the methods are relatively new, published information on achievable linkage quality in real-world scenarios is limited. Objectives and ApproachWe present and describe several real-world applications of privacy preserving record linkage (PPRL-Bloom) where the quality of the linkage could be ascertained. In each case, data was linked ‘blind’; that is, without linkers having access to the original personal identifiers at any stage, or having any additional information about the records. Evaluations include a linkage of state-based morbidity and mortality records, a linkage of a number of general practice datasets to morbidity and emergency records, and a linkage of a range of state-based non-health administrative data, including education, police, housing, birth and child protection records. ResultsThe privacy preserving record linkage performed admirably, with very high-quality results across all evaluations. Conclusion / ImplicationsPrivacy preserving linkage is a useful and innovative methodology that is currently being used in real world projects. The results of these evaluation suggest it can be an appropriate linkage tool when legal or other constraints block release of personally identifying information to third party linkage units.

Download Full-text