Probabilistic Record Linkage of Anonymous Cancer Registry Records

Author(s):  
Martin Meyer ◽  
Martin Radespiel-Tröger ◽  
Christine Vogel
2005 ◽  
Vol 44 (05) ◽  
pp. 626-630 ◽  
Author(s):  
W. Stühlinger ◽  
W. Oberaigner

Summary Objective: Record linkage of patient data originating from various data sources and record linkage for checking uniqueness of patient registration are common tasks for every cancer registry. In Austria, there is no unique person identifier in use in the medical system. Hence, it was necessary and the goal of this work to develop an efficient means of record linkage for use in cancer registries in Austria. Methods: We adapted the method of probabilistic record linkage to the situation of cancer registries in Austria. In addition to the customary components of this method, we also took into consideration typing errors commonly occurring in names and dates of birth. The method was implemented in a program written in DELPHITM with interfaces optimised for cancer registries. Results: Applying our record linkage method to 130,509 linkages results in 105,272 (80.7%) identical pairs. For these identical pairs, 88.9% of decisions were performed automatically and 11.1% semi-automatically. For results decided automatically, 6.9% did not have simultaneous identity of last name, first name and date of birth. For results decided semi-automatically, 48.4% did not have an identical last name, 25.6% did not have an identical date of birth and 83.1% did not have simultaneous identity of last name and date of birth and first name. Conclusions: The method implemented in our cancer registry solves all record linkage problems in Austria with sufficient precision.


2018 ◽  
Vol 4 (Supplement 2) ◽  
pp. 65s-65s
Author(s):  
T. Gillespie ◽  
P. Dhillon ◽  
K. Ward ◽  
A. Aggarwal ◽  
D. Bumb ◽  
...  

Background: Cancer registries worldwide are vital to determine cancer burden, plan cancer control measures, and facilitate research. Population-based cancer registries are a priority for LMICs by the UICC; the National Cancer Registry Program (NCRP) of India oversees 28 such registries. A primary function of registries is to combine data for the same individual from multiple sources. For other disease cohorts where cancer is an outcome of interest, registries can potentially connect information by linking datasets together. Barriers to successful registration and linkages include systems in which cancer is not a notifiable disease, no universal unique individual identifier exists, and lack of trained personnel. This study utilizes technology and infrastructure to develop better linkages, surveillance, and outcomes. Aim: To assess the feasibility of linking large cohorts designed for cardio-metabolic disease research with cancer registries in New Delhi and Chennai; determine additional steps required for linkage accuracy and completeness; and develop detailed protocols for future applications. Methods: A pilot protocol for linkage between a large diabetes cohort and cancer registries in Delhi and Chennai was developed using MatchPro, a probabilistic record linkage program developed for cancer registries. Probabilistic software links datasets together in the presence of uncertainty (eg misspelled or abbreviated names) to identify record pairs with high probability of representing the same individual. For this study, algorithms were developed to address unique aspects of names and demographics in India. The software and algorithms focused on: detecting duplicates in cancer registries; and linking registries with external files from diabetes cohorts. In Delhi, 3 1-year datasets covering 3 years (2010, 2011, 2012) were linked with the diabetes cohort; in Chennai, the linkage included 3 5-year datasets covering 15 years (2000-04, '05-'09, '10-'14). The unique ID (Aadhaar) is not collected or linked systematically between different systems at this point in time. Results: Linkage attempts yielded potential matches ranked according to probabilistic scores; highest scores were reviewed to determine true matches. In Chennai, this process yielded: (2010-2014) 21% self-reported (SR) cases matching perfectly, 36% requiring follow-up, 13 nonreported (NR) cases found; 2005-2009: 33% SR cases matched perfectly, 1 NR case found; 2000-2004: 1 NR case. Also, 2 training workshops on data linkages and software were held. Conclusion: Linkages between cancer registries and other data sources are feasible in LMICs using probabilistic record linkage software augmented by manual matching. Future efforts to use existing epidemiologic resources (cohorts) and cancer research infrastructure (registries and clinical centers) can enhance research including understanding shared risk factors and pathophysiologic mechanisms e.g., between cancer and other NCD.


2000 ◽  
Vol 16 (2) ◽  
pp. 439-447 ◽  
Author(s):  
Kenneth R. de Camargo Jr. ◽  
Cláudia M. Coeli

Apresenta-se um sistema de relacionamento de bases de dados fundamentado na técnica de relacionamento probabilístico de registros, desenvolvido na linguagem C++ com o ambiente de programação Borland C++ Builder versão 3.0. O sistema foi testado a partir de fontes de dados de diferentes tamanhos, tendo sido avaliado em tempo de processamento e sensibilidade para a identificação de pares verdadeiros. O tempo gasto com o processamento dos registros foi menor quando se empregou o programa do que ao ser realizado manualmente, em especial, quando envolveram bases de maior tamanho. As sensibilidades do processo manual e do processo automático foram equivalentes quando utilizaram bases com menor número de registros; entretanto, à medida que as bases aumentaram, percebeu-se tendência de diminuição na sensibilidade apenas no processo manual. Ainda que em fase inicial de desenvolvimento, o sistema apresentou boa performance tanto em velocidade quanto em sensibilidade. Embora a performance dos algoritmos utilizados tenha sido satisfatória, o objetivo é avaliar outras rotinas, buscando aprimorar o desempenho do sistema.


2014 ◽  
Vol 30 (2) ◽  
pp. 433-438 ◽  
Author(s):  
Silvano Barbosa de Oliveira ◽  
Edgar Merchan-Hamann ◽  
Leila Denise Alves Ferreira Amorim

The aim of this study is to estimate the prevalence of HIV/HBV and HIV/HCV coinfections among AIDS cases reported in Brazil, and to describe the epidemiological profile of these cases. Coinfection was identified through probabilistic record linkage of the data of all patients carrying the HIV virus recorded as AIDS patients and of those patients reported as carriers of hepatitis B or C virus in various databases from the Brazilian Ministry of Health from 1999 to 2010. In this period 370,672 AIDS cases were reported, of which 3,724 were HIV/HBV coinfections. Women are less likely to become coinfected than men and the chance of coinfection increases with age. This study allowed an important evaluation of HBV/HIV and HCV/HIV coinfections in Brazil using information obtained via merging secondary databases from the Ministry of Health, without conducting seroprevalence research. The findings of this study might be important for planning activities of the Brazilian epidemiologic surveillance agencies.


Author(s):  
Colin Babyak ◽  
Abdelnasser Saidi

ABSTRACTObjectivesThe objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment.ApproachWe will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis.ResultsAlthough Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome.ConclusionThe combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.


2015 ◽  
Vol 45 (3) ◽  
pp. 954-964 ◽  
Author(s):  
Adrian Sayers ◽  
Yoav Ben-Shlomo ◽  
Ashley W Blom ◽  
Fiona Steele

Author(s):  
Yinghao Zhang ◽  
Senlin Xu ◽  
Mingfan Zheng ◽  
Xinran Li

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.


Author(s):  
Jana Asher ◽  
Dean Resnick ◽  
Jennifer Brite ◽  
Robert Brackbill ◽  
James Cone

Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter-based methods, machine learning methods, and Bayesian methods. Practical considerations, such as data standardization and privacy concerns, are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters, such as 9/11.


Sign in / Sign up

Export Citation Format

Share Document