true match
Recently Published Documents


TOTAL DOCUMENTS

11
(FIVE YEARS 6)

H-INDEX

3
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Jenna Watson

Frontal sinus radiographs are frequently used to identify human remains. However, the method of visually comparing antemortem (AM) to postmortem (PM) cranial radiographs has been criticized for being a subjective approach that relies on practitioner experience, training, and judgment rather than on objective, quantifiable procedures with published error rates. The objective of this study was to explore the use of ArcMap and its spatial analysis tool, Similarity Search, as a quantifiable, reliable, and reproducible method for identifying frontal sinus matches from cranial radiographs. Using cranial radiographs of 100 individuals from the William M. Bass DonatedSkeletal Collection, the frontal sinuses were digitized to create two-dimensional polygons. Similarity Search was evaluated on its ability to identify the correct AM radiograph using three variables: the number of scallops and the area and perimeter values of the polygons. Using all three variables, Similarity Search correctly identified the true match AM polygon in 58% of the male groups and in 62% of the female groups. These results indicate that ArcMap can be used with frontal sinus radiographs. However, further analysis of the three variables revealed that scallop number did not provide sufficient information about frontal sinus shape to increase the accuracy of Similarity Search, and area and perimeter only captured the size of the frontal sinus polygons, not shape. This research is a first step in developing a user-friendly, quantifiable frontal sinus comparison method for the purpose of positive identification.


2021 ◽  
Vol 59 (3) ◽  
pp. 865-918
Author(s):  
Ran Abramitzky ◽  
Leah Boustan ◽  
Katherine Eriksson ◽  
James Feigenbaum ◽  
Santiago Pérez

The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5 percent) false positive rates. The automated methods trace out a frontier illustrating the trade-off between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods. (JEL C81, C83, N01, N31, N32)


Author(s):  
Anna Lin ◽  
Soon Song ◽  
Nancy Wang

IntroductionStats NZ’s Integrated Data Infrastructure (IDI) is a linked longitudinal database combining administrative and survey data. Previously, false positive linkages (FP) in the IDI were assessed by clerical review of a sample of linked records, which was time consuming and subject to inconsistency. Objectives and ApproachA modelled approach, ‘SoLinks’ has been developed in order to automate the FP estimation process for the IDI. It uses a logistic regression model to calculate the probability that a given link is a true match. The model is based on the agreement types defined for four key linking variables – first name, last name, sex, and date of birth. Exemptions have been given to some specific types of links that we believe to be high quality true matches. The training data used to estimate the model parameters was based on the outcomes of the clerical review process over several years. ResultsWe have compared the FP rates estimated through clerical review to the ones estimated through the SoLinks model. Some SoLinks estimates fall outside the 95% confidence intervals of the clerically reviewed ones. This may be the result of the pre-defined probabilities for the specific types of links are too high. ConclusionThe automation of FP checking has saved analyst time and resource. The modelled FP estimates have been more stable across time than the previous clerical reviews. As this model estimates the probability of a true match at the individual link level, we may provide this probability to researchers so that they can calculate linked quality indicators for their research populations.


2020 ◽  
Vol 26 (4) ◽  
pp. 221-231
Author(s):  
Yuying Chen ◽  
Huacong Wen ◽  
Russel Griffin ◽  
Mary Joan Roach ◽  
Michael L. Kelly

Background: Linking records from the National Spinal Cord Injury Model Systems (SCIMS) database to the National Trauma Data Bank (NTDB) provides a unique opportunity to study early variables in predicting long-term outcomes after traumatic spinal cord injury (SCI). The public use data sets of SCIMS and NTDB are stripped of protected health information, including dates and zip code. Objectives: To develop and validate a probabilistic algorithm linking data from an SCIMS center and its affiliated trauma registry. Method: Data on SCI admissions 2011–2018 were retrieved from an SCIMS center (n = 302) and trauma registry (n = 723), of which 202 records had the same medical record number. The SCIMS records were divided equally into two data sets for algorithm development and validation, respectively. We used a two-step approach: blocking and weight generation for linking variables (race, insurance, height, and weight). Results: In the development set, 257 SCIMS-trauma pairs shared the same sex, age, and injury year across 129 clusters, of which 91 records were true-match. The probabilistic algorithm identified 65 of the 91 true-match records (sensitivity, 71.4%) with a positive predictive value (PPV) of 80.2%. The algorithm was validated over 282 SCIMS-trauma pairs across 127 clusters and had a sensitivity of 73.7% and PPV of 81.1%. Post hoc analysis shows the addition of injury date and zip code improved the specificity from 57.9% to 94.7%. Conclusion: We demonstrate the feasibility of probabilistic linkage between SCIMS and trauma records, which needs further refinement and validation. Gaining access to injury date and zip code would improve record linkage significantly.


2020 ◽  
Vol 6 (7) ◽  
pp. 50471-50491
Author(s):  
Rodrigo Nani França ◽  
David Augusto Ribeiro ◽  
Renata Lopes Rosa ◽  
Demostenes Zegarra Rodriguez

O processo de reconhecimento biométrico da íris é uma das tecnologias biométricas mais consistentes entre outras disponíveis atualmente. No entanto, sua eficiência e precisão podem ser afetadas por imagens de íris de baixa qualidade usadas como entrada para um sistema de reconhecimento, assim, o desempenho global é reduzido. Nesse contexto, este trabalho propõe um estudo de avaliação para determinar o impacto da qualidade da imagem da íris no desempenho do sistema biométrico da íris, utilizando as principais métricas apresentadas na norma ISO / IEC 29794-6: 2015. Os testes experimentais são realizados usando um banco de dados de imagens de íris e o software de reconhecimento biométrico OSIRIS, ambos amplamente aceitos e referenciados nas últimas pesquisas. Os resultados experimentais mostram os valores de intervalo de cada métrica de qualidade e o número de imagens que atingem os valores mínimos necessários. O desempenho do sistema biométrico é avaliado pelos parâmetros True-Match (TM) e False Non-Match (FNM); assim, foi possível identificar que quanto maior o nível de qualidade da imagem, menor o valor de FNM; portanto, o desempenho do sistema é aprimorado.


Author(s):  
Misturah Adunni Alaran ◽  
AbdulAkeem Adesina Agboola ◽  
Adio Taofiki Akinwale ◽  
Olusegun Folorunso

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.


Author(s):  
Dean M Resnick ◽  
Lisa B Mirel

Probabilistic record linkage implies that there is some level of uncertainty related to the classification of pairs as links or non-links vis-à-vis their true match status. As record linkage is usually performed as a preliminary step to developing statistical estimates, the question then is how does this linkage uncertainty propagate to them? In this paper, we develop an approach to estimate the impact of linkage uncertainty on derived estimates by using a re-sampling approach. For each iteration of the re-sampling, pairs are classified as links or non-links by Monte-Carlo assignment to model estimated true match probabilities. By looking at the range of estimates produced in a series of re-samples, we can estimate the distribution of derived statistics under the prevailing incidence of linkage uncertainty. For this analysis we use the results of linking the 2014 National Hospital Care Survey to the National Death Index performed at the National Center for Health Statistics. We assess the precision of hospital-level death rate estimates.


2016 ◽  
Vol 55 (03) ◽  
pp. 276-283 ◽  
Author(s):  
Tenniel Guiver ◽  
Sean Randall ◽  
Anna Ferrante ◽  
James Semmens ◽  
Phil Anderson ◽  
...  

SummaryBackground: Record linkage techniques allow different data collections to be brought together to provide a wider picture of the health status of individuals. Ensuring high linkage quality is important to guarantee the quality and integrity of research. Current methods for measuring linkage quality typically focus on precision (the proportion of incorrect links), given the difficulty of measuring the proportion of false negatives.Objectives: The aim of this work is to introduce and evaluate a sampling based method to estimate both precision and recall following record linkage.Methods: In the sampling based method, record-pairs from each threshold (including those below the identified cut-off for acceptance) are sampled and clerically reviewed. These results are then applied to the entire set of record-pairs, providing estimates of false positives and false negatives. This method was evaluated on a synthetically generated dataset, where the true match status (which records belonged to the same person) was known.Results: The sampled estimates of linkage quality were relatively close to actual linkage quality metrics calculated for the whole synthetic dataset. The precision and recall measures for seven reviewers were very consistent with little variation in the clerical assessment results (overall agreement using the Fleiss Kappa statistics was 0.601).Conclusions: This method presents as a possible means of accurately estimating matching quality and refining linkages in population level linkage studies. The sampling approach is especially important for large project linkages where the number of record pairs produced may be very large often running into millions.


2014 ◽  
Vol 53 (03) ◽  
pp. 186-194 ◽  
Author(s):  
A. M. Thomas ◽  
J. M. Dean ◽  
L. M. Olson ◽  
L. J. Cook

SummaryObjective: To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information.Methods: A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status.Results: High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p < 0.002), but imputed matched sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust.Conclusions: The level of information available to a linkage is an important con -sideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case- specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.


2011 ◽  
Vol 22 (1) ◽  
pp. 31-38 ◽  
Author(s):  
Xiaochun Li ◽  
Changyu Shen

We review ideas, approaches and progress in the field of record linkage. We point out that the latent class models used in probabilistic matching have been well developed and applied in a different context of diagnostic testing when the true disease status is unknown. The methodology developed in the diagnostic testing setting can be potentially translated and applied in record linkage. Although there are many methods for record linkage, a comprehensive evaluation of methods for a wide range of real-world data with different data characteristics and with true match status is absent due to lack of data sharing. However, the recent availability of generators of synthetic data with realistic characteristics renders such evaluations feasible.


Sign in / Sign up

Export Citation Format

Share Document