scholarly journals Proof of Concept for a Privacy Preserving National Mortality Register

Author(s):  
Rainer Schnell ◽  
Christian Borgs

IntroductionNational mortality registers are essential for medical research. Therefore, most nations operate such registers. Due to the administrative structure and data protection legislation, there is no such registry in Germany. We demonstrate that a national mortality registry is technically feasible under the given constraints with privacy preserving record linkage (PPRL). Objectives and ApproachGetting the legal permission to operate a national mortality registry for research will be easier if the linkage can be done without revealing personal identifiers by using PPRL. To estimate precision and recall of different encodings, we used two settings: (1) matching a local mortality registry (n = 14,003) with mortality data of a university hospital (n = 2,466); (2) matching 1 million simulated records from a national database of names with a corrupted subset. This corresponds to a match of all deceased persons with the deceased persons in the largest federal state (n = 205,000). ResultsLinkage results for clear-text identifiers show very high recall and precision. Bloom-Filter based encryptions yield comparable results. Neither precision nor recall declines more than 2%. Phonetic codes yield high precision but low recall. Some variants of Bloom Filter-based encodings yield better results than probabilistic linkage on clear-text identifiers. This is mainly due to the rarely mentioned detail of using different passwords for different identifiers in the same Bloom Filter. Therefore, implementation details of Bloom Filters are more important than commonly thought. Overall, we recommend the use of salted Bloom Filter-based methods with different passwords for different identifiers to increase security and to prevent all known attacks on identifier encryptions. Conclusion/ImplicationsAlthough most PPRL techniques would yield acceptable results in the given setting of a national register, salted Bloom filter encodings are more secure against attacks while still showing high precision and recall. Therefore, we consider a national mortality register using only encrypted identifiers of deceased persons as feasible.

Author(s):  
Rainer Schnell ◽  
Christian Borgs

ABSTRACTObjectiveIn most European settings, record linkage across different institutions has to be based on personal identifiers such as names, birthday or place of birth. To protect the privacy of research subjects, the identifiers have to be encrypted. In practice, these identifiers show error rates up to 20% per identifier, therefore linking on encrypted identifiers usually implies the loss of large subsets of the databases. In many applications, this loss of cases is related to variables of interest for the subject matter of the study. Therefore, this kind of record-linkage will generate biased estimates. These problems gave rise to techniques of Privacy Preserving Record Linkage (PPRL). Many different PPRL techniques have been suggested within the last 10 years, very few of them are suitable for practical applications with large database containing millions of records as they are typical for administrative or medical databases. One proven technique for PPRL for large scale applications is PPRL based on Bloom filters.MethodUsing appropriate parameter settings, Bloom filter approaches show linkage results comparable to linkage based on unencrypted identifiers. Furthermore, this approach has been used in real-world settings with data sets containing up to 100 Million records. By the application of suitable blocking strategies, linking can be done in reasonable time.ResultHowever, Bloom filters have been subject of cryptographic attacks. Previous research has shown that the straight application of Bloom filters has a nonzero re-identification risk. We will present new results on recently developed techniques to defy all known attacks on PPRL Bloom filters. These computationally simple algorithms modify the identifiers by different cryptographic diffusion techniques. The presentation will demonstrate these new algorithms and show their performance concerning precision, recall and re-identification risk on large databases.


Author(s):  
Sean Randall ◽  
Adrian P Brown ◽  
Anna M Ferrante ◽  
James H Boyd

IntroductionAvailable and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based on Bloom filters have proven vulnerable to frequency-based attacks. ObjectivesIn this paper, we present and evaluate a novel protocol that aims to meld both the accuracy of the Bloom filter method with the privacy achievable through the anonymous linkage code methodology. MethodsThe protocol involves creating multiple match-keys for each record, with the composition of each match-key depending on attributes of the underlying datasets being compared. The protocol was evaluated through de-duplication of four administrative datasets and two synthetic datasets; the ‘answers’ outlining which records belonged to the same individual were known for each dataset. The results were compared against results achieved with un-encoded linkage and other privacy preserving techniques on the same datasets. ResultsThe multiple match-key protocol presented here achieved high quality across all datasets, performing better than record-level Bloom filters and the SLK, but worse than field-level Bloom filters. ConclusionThe presented method provides high linkage quality while avoiding the frequency based attacks that have been demonstrated against the Bloom filter approach. The method appears promising for real world use.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Sean Randall ◽  
Helen Wichmann ◽  
Adrian Brown ◽  
James Boyd ◽  
Tom Eitelhuber ◽  
...  

Abstract Background Privacy preserving record linkage (PPRL) methods using Bloom filters have shown promise for use in operational linkage settings. However real-world evaluations are required to confirm their suitability in practice. Methods An extract of records from the Western Australian (WA) Hospital Morbidity Data Collection 2011–2015 and WA Death Registrations 2011–2015 were encoded to Bloom filters, and then linked using privacy-preserving methods. Results were compared to a traditional, un-encoded linkage of the same datasets using the same blocking criteria to enable direct investigation of the comparison step. The encoded linkage was carried out in a blinded setting, where there was no access to un-encoded data or a ‘truth set’. Results The PPRL method using Bloom filters provided similar linkage quality to the traditional un-encoded linkage, with 99.3% of ‘groupings’ identical between privacy preserving and clear-text linkage. Conclusion The Bloom filter method appears suitable for use in situations where clear-text identifiers cannot be provided for linkage.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Elmahdi Bentafat ◽  
M. Mazhar Rathore ◽  
Spiridon Bakiras

Intelligent transportation systems necessitate a fine-grained and accurate estimation of vehicular traffic flows across critical paths of the underlying road network. However, such statistics should be collected in a manner that does not disclose the trajectories of individual users. To this end, we introduce a privacy-preserving protocol that leverages roadside units (RSUs) to communicate with the passing vehicles, in order to construct encrypted Bloom filters stemming from random vehicle IDs that are chosen secretly by the individual vehicles. Each Bloom filter represents the set of vehicle IDs that contacted the RSU but may also be used to estimate the traffic flow between any number of RSUs. More precisely, we designed a probabilistic model that approximates multipoint traffic flows by estimating the number of common vehicles among a given set of RSUs. Through extensive simulation experiments, we demonstrate that our protocol is very accurate—with a minor deviation from the real traffic flow—and show that it reduces the estimation error by a large factor, when compared to the current state-of-the-art approaches. Furthermore, our implementation of the underlying cryptographic primitives illustrates the feasibility, practicality, and scalability of the system.


Author(s):  
Thilina Ranbaduge ◽  
Peter Christen

IntroductionApplications in domains ranging from healthcare to national security increasingly require records about individuals in sensitive databases to be linked in privacy-preserving ways. Missing values make the linkage process challenging because they can affect the encoding of attribute values. No study has systematically investigated how missing values affect the outcomes of different encoding techniques used in privacy-preserving linkage applications. Objectives and ApproachBinary encodings, such as Bloom filters, are popular for linking sensitive databases. They are now employed in real-world linkage applications. However, existing encoding techniques assume the quasi-identifying attributes used for encoding to be complete. Missing values can lead to incomplete encodings which can result in decreased or increased similarities and therefore to false non-matches or false matches. In this study we empirically evaluate three binary encoding techniques using real voter databases, where pairs of records that correspond to the same voter (with name or address changes) resulted in files of 100,000 and 500,000 records containing from 0% to 50% missing values. ResultsWe encoded between two and four of the attributes first and last name, street, and city into three record-level binary encodings: Cryptographic long-term key (CLK) [Schnell et al. 2009], record-level Bloom filter (RBF) [Durham et al. 2014], and tabulation Min-hashing (TBH) [Smith 2017]. Experiments showed a 10% to 25% drop on average in both precision and recall for all encoding techniques when missing values are increasing. CLK resulted in the highest decrease in precision, while TBH resulted in the highest decrease in recall compared to the other encoding techniques. ConclusionBinary encodings such as Bloom filters are now used in practical applications for linking sensitive databases. Our evaluation shows that such encoding techniques can result in lower linkage quality if there are missing values in quasi-identifying attributes. This highlights the need for novel encoding techniques that can overcome the challenge of missing values.


Author(s):  
Frank Niedermeyer ◽  
Simone Steinmetzer ◽  
Martin Kroll ◽  
Rainer Schnell

Bloom filter encoded identifiers are increasingly used for privacy preserving record linkage applications, because they allow for errors in encrypted identifiers. However, little research on the security of Bloom filters has been published so far. In this paper, we formalize a successful attack on Bloom filters composed of bigrams. It has previously been assumed in the literature that an attacker knows the global data set from which a sample is drawn. In contrast, we suppose that an attacker does not know this global data set. Instead, we assume the adversary knows a publicly available list of the most frequent attributes. The attack is based on subtle filtering and elementary statistical analysis of encrypted bigrams. The attack described in this paper can be used for the deciphering of a whole database instead of only a small subset of the most frequent names, as in previous research. We illustrate our proposed method with an attack on a database of encrypted surnames. Finally, we describe modifications of the Bloom filters for preventing similar attacks.


Author(s):  
Rainer Schnell ◽  
Christian Borgs

IntroductionDiagnostic codes, such as the ICD-10, may be considered as sensitive information. If such codes have to be encoded using current methods for data linkage, all hierarchical information given by the code positions will be lost. We present a technique (HPBFs) for preserving the hierarchical information of the codes while protecting privacy. The new method modifies a widely used Privacy-preserving Record Linkage (PPRL) technique based on Bloom filters for the use with hierarchical codes. Objectives and ApproachAssessing the similarities of hierarchical codes requires considering the code positions of two codes in a given diagnostic hierarchy. The hierarchical similarities of the original diagnostic code pairs should correspond closely to the similarity of the encoded pairs of the same code. Furthermore, to assess the hierarchy-preserving properties of an encoding, the impact on similarity measures from differing code positions at all levels of the code hierarchy can be evaluated. A full match of codes should yield a higher similarity than partial matches. Finally, the new method is tested against ad-hoc solutions as an addition to a standard PPRL setup. This is done using real-world mortality data with a known link status of two databases. ResultsIn all applications for encoded ICD codes where either categorical discrimination, relational similarity or linkage quality in a PPRL setting is required, HPBFs outperform other known methods. Lower mean differences and smaller confidence intervals between clear-text codes and encrypted code pairs were observed, indicating better preservation of hierarchical similarities. Finally, using these techniques allows for much better hierarchical discrimination for partial matches. ConclusionThe new technique yields better linkage results than all other known methods to encrypt hierarchical codes. In all tests, comparing categorical discrimination, relational similarity and PPRL linkage quality, HPBFs outperformed methods currently used.


2019 ◽  
Vol 265 ◽  
pp. 02016
Author(s):  
Vladimir Karetnikov ◽  
Sergey Rudykh ◽  
Aleksandra Ivanova

Survey works on inland waterways can be contingently divided into two directions. The first ones are directed at maintaining the given dimensions of the waterway and are carried out with the use of technical fleet vessels, which includes the dredging fleet. At the same time the basis creation, the results verification and the control of the survey works implementation are carried out by the survey party. The main types of work here are surveying and trawling works, the implementation of which is carried out at the present time on the inland waterways of Russia using geo information technologies, which makes it possible to improve the quality and efficiency of their realization. Such an approach, firstly, has a positive effect on the implementation of the navigational hydrographic support system of navigators, including in the part of electronic cartography, and secondly, it allows to provide the survey works realization at the modern level. The most effective approaches and methods of modern geo information technologies application, implemented for the collection and processing of high-precision bathymetric information and positioning data to ensure the navigation safety on the inland waterways of the Russian Federation, are considered in the paper.


Sign in / Sign up

Export Citation Format

Share Document