record pair
Recently Published Documents


TOTAL DOCUMENTS

4
(FIVE YEARS 2)

H-INDEX

0
(FIVE YEARS 0)

Author(s):  
Yinghao Zhang ◽  
Senlin Xu ◽  
Mingfan Zheng ◽  
Xinran Li

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Sen Xiong ◽  
Shuan Grannis, MD, MS, FAAP

Background and Hypothesis: Accurate record linkage is essential to address fragmentation of patient data across independent healthcare organizations. To accurately evaluate record linkage methods, so-called “gold standard” data sets with labeled true matches and non-matches are needed. Human review, the process of manually assessing potentially linked patient demographic records and determining whether the record pair belongs to an idiosyncratic individual, is needed to create these datasets. However, the human review process is susceptible to bias and human error. Consequently, record linkage accuracy evaluations are prone to be biased by inaccurate gold standards. Consistent and scientifically rigorous methods for creating gold standard record linkage data sets must be developed, as none have yet been described. In this study, we describe a repeatable process for developing consistent manually reviewed datasets and analyze the results obtained from 15 human reviews of 200 record pairs following our protocol. Experimental Design/Methods: We obtained patient records from the Indiana Network for Patient Care and Marion County Health Department. We created record pairs for manual reviews by probabilistically linking datasets using multiple blocking schemes. Two-hundred record pairs were then manually reviewed by 15 different individuals and the results were analyzed. Results: Across the 200 record pairs reviewed by 15 reviewers, 155 were nondiscordant pairs whereas 45 were discordant, 40 among which were the result of outliers. Conclusion and Potential Impact: From the record pair evaluation results, some empirical rules can be established for the process of manual review, though the nuances of evaluation reasoning require more discussion and a larger sample size. Nonetheless, establishing a standard for manual reviewing is a step towards better health care and complete patient records.


2018 ◽  
Vol 34 (1) ◽  
pp. 41-54
Author(s):  
Abel Dasylva

Abstract This article looks at the estimation of an association parameter between two variables in a finite population, when the variables are separately recorded in two population registers that are also imperfectly linked. The main problem is the occurrence of linkage errors that include bad links and missing links. A methodology is proposed when clerical-reviews may reliably determine the match status of a record-pair, for example using names, demographic and address information. It features clerical-reviews on a probability sample of pairs and regression estimators that are assisted by a statistical model of comparison outcomes in a pair. Like other regression estimators, this estimator is design-consistent regardless of the model validity. It is also more efficient when the model holds.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 294
Author(s):  
G Somasekhar ◽  
SeshaSravani K ◽  
Keerthi P ◽  
Sai Sandeep G

Record Linkage and Deduplication are the two process that are used in matching records. Matching of records is done to remove the duplicate records. These duplicate records highly influence the outputs of data mining and data processing. If the matching of records is done on the single database, it is called Deduplication. In Deduplication we check for the duplicate records in the single database. Unlike deduplication if the matching of the records is done on the several databases it is called as record linkage. In this paper we also discuss about the indexing technique called as traditional blocking which is used to remove non matching pairs that leads to the less number of record pair to be compared.


Sign in / Sign up

Export Citation Format

Share Document