record pair
Recently Published Documents

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.

Download Full-text

Designing a standard protocol for manually reviewing patient data demographics for record linkage

Proceedings of IMPRS ◽

10.18060/23638 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Sen Xiong ◽

Shuan Grannis, MD, MS, FAAP

Keyword(s):

Record Linkage ◽

Gold Standard ◽

Human Error ◽

Health Department ◽

Patient Data ◽

Linkage Data ◽

Data Sets ◽

Patient Records ◽

Healthcare Organizations ◽

Record Pair

Background and Hypothesis: Accurate record linkage is essential to address fragmentation of patient data across independent healthcare organizations. To accurately evaluate record linkage methods, so-called “gold standard” data sets with labeled true matches and non-matches are needed. Human review, the process of manually assessing potentially linked patient demographic records and determining whether the record pair belongs to an idiosyncratic individual, is needed to create these datasets. However, the human review process is susceptible to bias and human error. Consequently, record linkage accuracy evaluations are prone to be biased by inaccurate gold standards. Consistent and scientifically rigorous methods for creating gold standard record linkage data sets must be developed, as none have yet been described. In this study, we describe a repeatable process for developing consistent manually reviewed datasets and analyze the results obtained from 15 human reviews of 200 record pairs following our protocol. Experimental Design/Methods: We obtained patient records from the Indiana Network for Patient Care and Marion County Health Department. We created record pairs for manual reviews by probabilistically linking datasets using multiple blocking schemes. Two-hundred record pairs were then manually reviewed by 15 different individuals and the results were analyzed. Results: Across the 200 record pairs reviewed by 15 reviewers, 155 were nondiscordant pairs whereas 45 were discordant, 40 among which were the result of outliers. Conclusion and Potential Impact: From the record pair evaluation results, some empirical rules can be established for the process of manual review, though the nuances of evaluation reasoning require more discussion and a larger sample size. Nonetheless, establishing a standard for manual reviewing is a step towards better health care and complete patient records.

Download Full-text

Design-Based Estimation with Record-Linked Administrative Files and a Clerical Review Sample

Journal of Official Statistics ◽

10.1515/jos-2018-0003 ◽

2018 ◽

Vol 34 (1) ◽

pp. 41-54

Author(s):

Abel Dasylva

Keyword(s):

Statistical Model ◽

Finite Population ◽

Probability Sample ◽

Model Validity ◽

Regression Estimators ◽

Is Design ◽

Record Pair

Abstract This article looks at the estimation of an association parameter between two variables in a finite population, when the variables are separately recorded in two population registers that are also imperfectly linked. The main problem is the occurrence of linkage errors that include bad links and missing links. A methodology is proposed when clerical-reviews may reliably determine the match status of a record-pair, for example using names, demographic and address information. It features clerical-reviews on a probability sample of pairs and regression estimators that are assisted by a statistical model of comparison outcomes in a pair. Like other regression estimators, this estimator is design-consistent regardless of the model validity. It is also more efficient when the model holds.

Download Full-text

Record linkage and deduplication using traditional blocking

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9705 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 294

Author(s):

G Somasekhar ◽

SeshaSravani K ◽

Keerthi P ◽

Sai Sandeep G

Keyword(s):

Data Mining ◽

Data Processing ◽

Record Linkage ◽

Indexing Technique ◽

Record Pair

Record Linkage and Deduplication are the two process that are used in matching records. Matching of records is done to remove the duplicate records. These duplicate records highly influence the outputs of data mining and data processing. If the matching of records is done on the single database, it is called Deduplication. In Deduplication we check for the duplicate records in the single database. Unlike deduplication if the matching of the records is done on the several databases it is called as record linkage. In this paper we also discuss about the indexing technique called as traditional blocking which is used to remove non matching pairs that leads to the less number of record pair to be compared.

Download Full-text

record pairRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Field Weights Computation for Probabilistic Record Linkage in Presence of Missing Data

Designing a standard protocol for manually reviewing patient data demographics for record linkage

Design-Based Estimation with Record-Linked Administrative Files and a Clerical Review Sample

Record linkage and deduplication using traditional blocking

record pair
Recently Published Documents