A primer on probabilistic record linkage

2021 ◽  
pp. 95-107
Author(s):  
Ted Enamorado
2000 ◽  
Vol 16 (2) ◽  
pp. 439-447 ◽  
Author(s):  
Kenneth R. de Camargo Jr. ◽  
Cláudia M. Coeli

Apresenta-se um sistema de relacionamento de bases de dados fundamentado na técnica de relacionamento probabilístico de registros, desenvolvido na linguagem C++ com o ambiente de programação Borland C++ Builder versão 3.0. O sistema foi testado a partir de fontes de dados de diferentes tamanhos, tendo sido avaliado em tempo de processamento e sensibilidade para a identificação de pares verdadeiros. O tempo gasto com o processamento dos registros foi menor quando se empregou o programa do que ao ser realizado manualmente, em especial, quando envolveram bases de maior tamanho. As sensibilidades do processo manual e do processo automático foram equivalentes quando utilizaram bases com menor número de registros; entretanto, à medida que as bases aumentaram, percebeu-se tendência de diminuição na sensibilidade apenas no processo manual. Ainda que em fase inicial de desenvolvimento, o sistema apresentou boa performance tanto em velocidade quanto em sensibilidade. Embora a performance dos algoritmos utilizados tenha sido satisfatória, o objetivo é avaliar outras rotinas, buscando aprimorar o desempenho do sistema.


2014 ◽  
Vol 30 (2) ◽  
pp. 433-438 ◽  
Author(s):  
Silvano Barbosa de Oliveira ◽  
Edgar Merchan-Hamann ◽  
Leila Denise Alves Ferreira Amorim

The aim of this study is to estimate the prevalence of HIV/HBV and HIV/HCV coinfections among AIDS cases reported in Brazil, and to describe the epidemiological profile of these cases. Coinfection was identified through probabilistic record linkage of the data of all patients carrying the HIV virus recorded as AIDS patients and of those patients reported as carriers of hepatitis B or C virus in various databases from the Brazilian Ministry of Health from 1999 to 2010. In this period 370,672 AIDS cases were reported, of which 3,724 were HIV/HBV coinfections. Women are less likely to become coinfected than men and the chance of coinfection increases with age. This study allowed an important evaluation of HBV/HIV and HCV/HIV coinfections in Brazil using information obtained via merging secondary databases from the Ministry of Health, without conducting seroprevalence research. The findings of this study might be important for planning activities of the Brazilian epidemiologic surveillance agencies.


Author(s):  
Colin Babyak ◽  
Abdelnasser Saidi

ABSTRACTObjectivesThe objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment.ApproachWe will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis.ResultsAlthough Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome.ConclusionThe combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.


2015 ◽  
Vol 45 (3) ◽  
pp. 954-964 ◽  
Author(s):  
Adrian Sayers ◽  
Yoav Ben-Shlomo ◽  
Ashley W Blom ◽  
Fiona Steele

Author(s):  
Yinghao Zhang ◽  
Senlin Xu ◽  
Mingfan Zheng ◽  
Xinran Li

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.


Author(s):  
Jana Asher ◽  
Dean Resnick ◽  
Jennifer Brite ◽  
Robert Brackbill ◽  
James Cone

Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter-based methods, machine learning methods, and Bayesian methods. Practical considerations, such as data standardization and privacy concerns, are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters, such as 9/11.


2008 ◽  
Vol 47 (04) ◽  
pp. 356-363 ◽  
Author(s):  
A. C. J. Ravelli ◽  
N. Méray ◽  
J. B. Reitsma ◽  
G. J. Bonsel ◽  
M. Tromp

Summary Objective: To describe an efficient, generalizable approach to validate probabilistic record linkage results, in particular by a model-guided detection of linking errors, and to apply this approach to validate linkage of admissions of newborns. Methods: Our double-blind validation procedure consisted of three steps: sample selection, data collection and data analysis. The linked Dutch national newborn admission registry contained 30,082 records for 2001 including readmissions (7.4%) and twins (9.7%). A highly informative sample was selected from the linked file by oversampling uncertain links based on modelderived linking weight. Four hundred and eight fax forms with minimal registry information (admissions of 191 children) were sent out to different pediatric units. The pediatricians were asked to create a short detailed patient history from independent sources. The linkage status and additional record data was validated against this external information. Results: Response rate was 97% (395/408 faxes). Accuracy of the linkage of singleton admissions was high: except for some expected errors in the uncertain area (0.02% of record pairs), linkage was error-free. Validation of multiple birth readmissions showed 37% linkage errors due to low data quality of the multiple birth variables. The quality of the linked registry file was still high; only 1.7% of the children were from a multiple birth with multiple admissions, resulting in less than 1% linking error. Conclusions: Our external validation procedure of record linkage was feasible, efficient, and informative about identifying the source of the errors.


2005 ◽  
Vol 44 (05) ◽  
pp. 626-630 ◽  
Author(s):  
W. Stühlinger ◽  
W. Oberaigner

Summary Objective: Record linkage of patient data originating from various data sources and record linkage for checking uniqueness of patient registration are common tasks for every cancer registry. In Austria, there is no unique person identifier in use in the medical system. Hence, it was necessary and the goal of this work to develop an efficient means of record linkage for use in cancer registries in Austria. Methods: We adapted the method of probabilistic record linkage to the situation of cancer registries in Austria. In addition to the customary components of this method, we also took into consideration typing errors commonly occurring in names and dates of birth. The method was implemented in a program written in DELPHITM with interfaces optimised for cancer registries. Results: Applying our record linkage method to 130,509 linkages results in 105,272 (80.7%) identical pairs. For these identical pairs, 88.9% of decisions were performed automatically and 11.1% semi-automatically. For results decided automatically, 6.9% did not have simultaneous identity of last name, first name and date of birth. For results decided semi-automatically, 48.4% did not have an identical last name, 25.6% did not have an identical date of birth and 83.1% did not have simultaneous identity of last name and date of birth and first name. Conclusions: The method implemented in our cancer registry solves all record linkage problems in Austria with sufficient precision.


2008 ◽  
Vol 15 (5) ◽  
pp. 654-660 ◽  
Author(s):  
M. Tromp ◽  
N. Meray ◽  
A. C. J. Ravelli ◽  
J. B. Reitsma ◽  
G. J. Bonsel

Sign in / Sign up

Export Citation Format

Share Document