How do you measure up? Methods to assess linkage quality

Anna Ferrante; James Boyd; Sean Randall; Adrian Brown; James Semmens

doi:10.23889/ijpds.v1i1.152

How do you measure up? Methods to assess linkage quality

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.152 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Anna Ferrante ◽

James Boyd ◽

Sean Randall ◽

Adrian Brown ◽

James Semmens

Keyword(s):

Record Linkage ◽

Service Use ◽

Performance Metrics ◽

Population Level ◽

Additional Information ◽

Linkage Quality ◽

Health And Disease ◽

The Impact ◽

Quality Process

ABSTRACT ObjectivesRecord linkage is a powerful technique which transforms discrete episode data into longitudinal person-based records. These records enable the construction and analysis of complex pathways of health and disease progression, and service use. Achieving high linkage quality is essential for ensuring the quality and integrity of research based on linked data. The methods used to assess linkage quality will depend on the volume and characteristics of the datasets involved, the processes used for linkage and the additional information available for quality assessment. This paper proposes and evaluates two methods to routinely assess linkage quality. ApproachLinkage units currently use a range of methods to measure, monitor and improve linkage quality; however, no common approach or standards exist. There is an urgent need to develop “best practices” in evaluating, reporting and benchmarking linkage quality. In assessing linkage quality, of primary interest is in knowing the number of true matches and non-matches identified as links and non-links. Any misclassification of matches within these groups introduces linkage errors. We present efforts to develop sharable methods to measure linkage quality in Australia. This includes a sampling-based method to estimate both precision (accuracy) and recall (sensitivity) following record linkage and a benchmarking method - a transparent and transportable methodology to benchmark the quality of linkages across different operational environments. ResultsThe sampling-based method achieved estimates of linkage quality that were very close to actual linkage quality metrics. This method presents as a feasible means of accurately estimating matching quality and refining linkages in population level linkage studies. The benchmarking method provides a systematic approach to estimating linkage quality with a set of open and shareable datasets and a set of well-defined, established performance metrics. The method provides an opportunity to benchmark the linkage quality of different record linkage operations. Both methods have the potential to assess the inter-rater reliability of clerical reviews. ConclusionsBoth methods produce reliable estimates of linkage quality enabling the exchange of information within and between linkage communities. It is important that researchers can assess risk in studies using record linkage techniques. Understanding the impact of linkage quality on research outputs highlights a need for standard methods to routinely measure linkage quality. These two methods provide a good start to the quality process, but it is important to identify standards and good practices in all parts of the linkage process (pre-processing, standardising activities, linkage, grouping and extracting).

Download Full-text

Assessing the impact of different grouping methods: time to rethink and regroup?

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.155 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Sean Randall ◽

Anna Ferrante ◽

Adrian Brown ◽

James Boyd ◽

James Semmens

Keyword(s):

Record Linkage ◽

Large Scale ◽

Low Cost ◽

Administrative Records ◽

Grouping Strategy ◽

Grouping Strategies ◽

Linkage Quality ◽

One To One ◽

The Impact

ABSTRACT ObjectivesThe grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods against the traditional merge based clustering approach using large scale administrative data. ApproachThe research aimed to both describe current grouping techniques used for record linkage, and to evaluate the most appropriate grouping method for specific circumstances. A range of grouping strategies were applied to three datasets with known truth sets. Conditions were simulated to appropriately investigate one-to-one, many-to-one and ongoing linkage scenarios. ResultsResults suggest alternate grouping methods will yield large benefits in linkage quality, especially when the quality of the underlying repository is high. Stepwise grouping methods were clearly superior for one-to-one linkage. There appeared little difference in linkage quality between many-to-one grouping approaches. The most appropriate techniques for ongoing linkage depended on the quality of the population spine and the underlying dataset. ConclusionsThese results demonstrate the large effect that the choice of grouping strategy can have on overall linkage quality. Ongoing linkages to high quality population spines provide large improvements in linkage quality compared to merge based linkages. Procuring or developing such a population spine will provide high linkage quality at far lower cost than current methods for improving linkage quality. By improving linkage quality at low cost, this resource can be further utilised by health researchers.

Download Full-text

Methodology of Natsal-COVID Wave 1: a large, quasi-representative survey with qualitative follow-up measuring the impact of COVID-19 on sexual and reproductive health in Britain

Wellcome Open Research ◽

10.12688/wellcomeopenres.16963.1 ◽

2021 ◽

Vol 6 ◽

pp. 209

Author(s):

Emily Dema ◽

Andrew J Copas ◽

Soazig Clifton ◽

Anne Conolly ◽

Margaret Blake ◽

...

Keyword(s):

General Population ◽

Reproductive Health ◽

Sexual Behaviour ◽

Sexual And Reproductive Health ◽

Service Use ◽

Population Data ◽

Population Level ◽

Level Data ◽

The Impact

Background: Britain’s National Surveys of Sexual Attitudes and Lifestyles (Natsal) have been undertaken decennially since 1990 and provide a key data source underpinning sexual and reproductive health (SRH) policy. The COVID-19 pandemic disrupted many aspects of sexual lifestyles, triggering an urgent need for population-level data on sexual behaviour, relationships, and service use at a time when gold-standard in-person, household-based surveys with probability sampling were not feasible. We designed the Natsal-COVID study to understand the impact of COVID-19 on the nation’s SRH and assessed the sample representativeness. Methods: Natsal-COVID Wave 1 data collection was conducted four months (29/7-10/8/2020) after the announcement of Britain’s first national lockdown (23/03/2020). This was an online web-panel survey administered by survey research company, Ipsos MORI. Eligible participants were resident in Britain, aged 18-59 years, and the sample included a boost of those aged 18-29. Questions covered participants’ sexual behaviour, relationships, and SRH service use. Quotas and weighting were used to achieve a quasi-representative sample of the British general population. Participants meeting criteria of interest and agreeing to recontact were selected for qualitative follow-up interviews. Comparisons were made with contemporaneous national probability surveys and Natsal-3 (2010-12) to understand bias. Results: 6,654 participants completed the survey and 45 completed follow-up interviews. The weighted Natsal-COVID sample was similar to the general population in terms of gender, age, ethnicity, rurality, and, among sexually-active participants, numbers of sexual partners in the past year. However, the sample was more educated, contained more sexually-inexperienced people, and included more people in poorer health. Conclusions: Natsal-COVID Wave 1 rapidly collected quasi-representative population data to enable evaluation of the early population-level impact of COVID-19 and lockdown measures on SRH in Britain and inform policy. Although sampling was less representative than the decennial Natsals, Natsal-COVID will complement national surveillance data and Natsal-4 (planned for 2022).

Download Full-text

Moving towards a reliable HIV incidence test – current status, resources available, future directions and challenges ahead

Epidemiology and Infection ◽

10.1017/s0950268816002910 ◽

2016 ◽

Vol 145 (5) ◽

pp. 925-941 ◽

Cited By ~ 19

Author(s):

G. MURPHY ◽

C. D. PILCHER ◽

S. M. KEATING ◽

R. KASSANJEE ◽

S. N. FACENTE ◽

...

Keyword(s):

Performance Metrics ◽

Critical Path ◽

Population Level ◽

Current Status ◽

Hiv Incidence ◽

Recent Infection ◽

Novel Approach ◽

Product Profile ◽

And Performance ◽

The Impact

SUMMARYIn 2011 the Incidence Assay Critical Path Working Group reviewed the current state of HIV incidence assays and helped to determine a critical path to the introduction of an HIV incidence assay. At that time the Consortium for Evaluation and Performance of HIV Incidence Assays (CEPHIA) was formed to spur progress and raise standards among assay developers, scientists and laboratories involved in HIV incidence measurement and to structure and conduct a direct independent comparative evaluation of the performance of 10 existing HIV incidence assays, to be considered singly and in combinations as recent infection test algorithms. In this paper we report on a new framework for HIV incidence assay evaluation that has emerged from this effort over the past 5 years, which includes a preliminary target product profile for an incidence assay, a consensus around key performance metrics along with analytical tools and deployment of a standardized approach for incidence assay evaluation. The specimen panels for this evaluation have been collected in large volumes, characterized using a novel approach for infection dating rules and assembled into panels designed to assess the impact of important sources of measurement error with incidence assays such as viral subtype, elite host control of viraemia and antiretroviral treatment. We present the specific rationale for several of these innovations, and discuss important resources for assay developers and researchers that have recently become available. Finally, we summarize the key remaining steps on the path to development and implementation of reliable assays for monitoring HIV incidence at a population level.

Download Full-text

Investigation and reporting of Data Quality within and between linked SAIL datasets

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.99 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Sarah Rees ◽

Arfon Rees

Keyword(s):

Data Quality ◽

Quality Measurement ◽

Research Process ◽

Quality Of Data ◽

Quality Reporting ◽

Additional Information ◽

Automated Processes ◽

High Degree ◽

Quality Process

ABSTRACTObjectivesThe SAIL databank brings together a range of datasets gathered primarily for administrative rather than research processes. These datasets contain information regarding different aspects of an individual’s contact with services which when combined form a detailed health record for individuals living (or deceased) in Wales. Understanding the quality of data in SAIL supports the research process by providing a level of assurance about the robustness of data, identifying and describing where there may be sources of potential bias due to invalid, incomplete, inconsistent or inaccurate data and therefore helping to increase the accuracy of research using these data. Designing processes to investigate and report on data quality within and between multiple datasets can be a time-consuming task to undertake; it requires a high degree of effort to ensure it is genuinely meaningful and useful to SAIL users and may require a range of different approaches. ApproachData quality tests for each dataset were written, considering a range of data quality dimensions including validity, consistency, accuracy and completeness. Tests were designed to capture not just the quality of data within each dataset, but also to assess consistency of data items between datasets. SQL scripts were written to test each of these aspects: in order to minimise repetition, automated processes were implemented where appropriate. Batch automation was used to called SQL stored procedures, which utilise metadata to generate dynamic SQL. The metadata (created as part of the data quality process) describes each dataset and the measurement parameters used to assess each field within the dataset. However automation on its own is insufficient and data quality process outputs require scrutiny and oversight to ensure they are actually capturing what they set out to do. SAIL users were consulted on the development of the data quality reports to ensure usability and appropriateness to support data utilisation for research. ResultsThe data quality reporting process is beneficial to the SAIL databank as it provides additional information to support the research process and in some cases may act as a diagnostic tool, detecting problems with data which can then be rectified. ConclusionThe development of data quality processes in SAIL is ongoing, and changes or developments in each dataset lead to new requirements for data quality measurement and reporting. A vital component of the process is the production of output that is genuinely meaningful and useful.

Download Full-text

A methodological framework to allocate storage space for outbound containers

Ingeniería Investigación y Tecnología ◽

10.22201/fi.25940732e.2021.22.3.022 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1-12

Author(s):

María D. Gracia

Keyword(s):

Performance Metrics ◽

Container Terminals ◽

Second Phase ◽

Storage Space ◽

Methodological Framework ◽

Two Phase ◽

The Impact ◽

Alternative Solutions ◽

Outbound Containers

The staking of containers on ideal locations within the yard is a tactical decision that affects the productivity of container terminals. The goal is to improve posterior loading and retrieval operations, to get better use of terminal resources. In this paper, we study how to allocate storage space for outbound containers in container terminals. A two-phase methodological framework is proposed. The first phase groups outbound containers into clusters of similar operational loading conditions. Then in a second phase, a bi-objective storage space assignment model is solved to determine the set of block-bays where groups of similar containers will be stored during the planning horizon. This study presents a double contribution. On one hand, it proposes a new methodological framework that combines operations research and data mining techniques to solve a storage space assignment problem for outbound containers. On the other hand, it analyzes the impact of three factors on four performance metrics used to evaluate the quality and quantity of alternative solutions to the problem of allocation of storage space for outbound containers. The experimental framework is composed of an experimental design study to assess the impact of three factors on four performance metrics used to assess the quality of the storage space assignment solutions, and a case study to validate the proposed approach. The experimental results reveal that the storage yard's capacity and the number of clusters used to group the containers destined to a vessel are the main factors that affect the number and quality of alternative solutions.

Download Full-text

A Simple Sampling Method for Estimating the Accuracy of Large Scale Record Linkage Projects

Methods of Information in Medicine ◽

10.3414/me15-01-0152 ◽

2016 ◽

Vol 55 (03) ◽

pp. 276-283 ◽

Cited By ~ 3

Author(s):

Tenniel Guiver ◽

Sean Randall ◽

Anna Ferrante ◽

James Semmens ◽

Phil Anderson ◽

...

Keyword(s):

Record Linkage ◽

Large Scale ◽

Sampling Method ◽

Population Level ◽

Kappa Statistics ◽

False Negatives ◽

True Match ◽

Linkage Quality ◽

Simple Sampling ◽

Data Collections

SummaryBackground: Record linkage techniques allow different data collections to be brought together to provide a wider picture of the health status of individuals. Ensuring high linkage quality is important to guarantee the quality and integrity of research. Current methods for measuring linkage quality typically focus on precision (the proportion of incorrect links), given the difficulty of measuring the proportion of false negatives.Objectives: The aim of this work is to introduce and evaluate a sampling based method to estimate both precision and recall following record linkage.Methods: In the sampling based method, record-pairs from each threshold (including those below the identified cut-off for acceptance) are sampled and clerically reviewed. These results are then applied to the entire set of record-pairs, providing estimates of false positives and false negatives. This method was evaluated on a synthetically generated dataset, where the true match status (which records belonged to the same person) was known.Results: The sampled estimates of linkage quality were relatively close to actual linkage quality metrics calculated for the whole synthetic dataset. The precision and recall measures for seven reviewers were very consistent with little variation in the clerical assessment results (overall agreement using the Fleiss Kappa statistics was 0.601).Conclusions: This method presents as a possible means of accurately estimating matching quality and refining linkages in population level linkage studies. The sampling approach is especially important for large project linkages where the number of record pairs produced may be very large often running into millions.

Download Full-text

The Impact of Purinergic System Enzymes on Noncommunicable, Neurological, and Degenerative Diseases

Journal of Immunology Research ◽

10.1155/2018/4892473 ◽

2018 ◽

Vol 2018 ◽

pp. 1-21 ◽

Cited By ~ 8

Author(s):

Margarete Dulce Bagatini ◽

Alessandra Antunes dos Santos ◽

Andréia Machado Cardoso ◽

Aline Mânica ◽

Cristina Ruedell Reschke ◽

...

Keyword(s):

Molecular Mechanisms ◽

Purinergic Signaling ◽

P2y Receptors ◽

Degenerative Diseases ◽

Limited Effectiveness ◽

Central Nervous Systems ◽

Health And Disease ◽

The Impact

Evidences show that purinergic signaling is involved in processes associated with health and disease, including noncommunicable, neurological, and degenerative diseases. These diseases strike from children to elderly and are generally characterized by progressive deterioration of cells, eventually leading to tissue or organ degeneration. These pathological conditions can be associated with disturbance in the signaling mediated by nucleotides and nucleosides of adenine, in expression or activity of extracellular ectonucleotidases and in activation of P2X and P2Y receptors. Among the best known of these diseases are atherosclerosis, hypertension, cancer, epilepsy, Alzheimer’s disease (AD), Parkinson’s disease (PD), and multiple sclerosis (MS). The currently available treatments present limited effectiveness and are mostly palliative. This review aims to present the role of purinergic signaling highlighting the ectonucleotidases E-NTPDase, E-NPP, E-5′-nucleotidase, and adenosine deaminase in noncommunicable, neurological, and degenerative diseases associated with the cardiovascular and central nervous systems and cancer. In conclusion, changes in the activity of ectonucleotidases were verified in all reviewed diseases. Although the role of ectonucleotidases still remains to be further investigated, evidences reviewed here can contribute to a better understanding of the molecular mechanisms of highly complex diseases, which majorly impact on patients’ quality of life.

Download Full-text

THE IMPACT OF DIGITAL PHOTOGRAPHY PROCESSING IN MOBILE APPLICATIONS ON THE QUALITY OF REACH IN SOCIAL MEDIA

Informatyka Automatyka Pomiary w Gospodarce i Ochronie Środowiska ◽

10.35784/iapgos.2377 ◽

2020 ◽

Vol 10 (4) ◽

pp. 73-76

Author(s):

Magdalena Paśnikowska-Łukaszuk ◽

Arkadiusz Urzędowski

Keyword(s):

Social Media ◽

Mobile Applications ◽

Digital Processing ◽

The Internet ◽

Additional Information ◽

Digital Photo ◽

Many Sources ◽

The Impact ◽

Modern Technologies

Modern technologies allow for quick processing of digital images. In the era of the Internet, there are many mobile applications supporting the digital processing of photos used on social media. The algorithms of many popular social networks focus on many factors, however, the photography that is placed on a given portal is of great importance. Social media allows you to reach many sources and people. With the help of a good photo, we can get high post reach that contain additional information. The use of mobile applications helps to achieve very good results. This paper presents the results obtained in the process of comparing posts using digital photo processing with those in which the photos were used without processing in a graphics program.

Download Full-text

SVhound: Detection of future Structural Variation hotspots

10.1101/2021.04.09.439237 ◽

2021 ◽

Author(s):

Luis Felipe Paulin ◽

Muthuswamy Raveendran ◽

Ronald Alan Harris ◽

Jeffrey Rogers ◽

Arndt von Haeseler ◽

...

Keyword(s):

Population Level ◽

Model Organisms ◽

Average Correlation ◽

Full Data ◽

Data Set ◽

1000 Genomes ◽

Unique Method ◽

Project Data ◽

The Impact

Recent population studies are ever growing in size of samples to investigate the diversity of a given population or species. These studies reveal ever new polymorphism that lead to important insights into the mechanisms of evolution, but are also important for the interpretation of these variations. Nevertheless, while the full catalog of variations across entire species remains unknown, we can predict which regions harbor additional variations that remain hidden and investigate their properties, thereby enhancing the analysis for potentially missed variants. To achieve this we implemented SVhound (https://github.com/lfpaulin/SVhound), which based on a population level SVs dataset can predict regions that harbor novel SV alleles. We tested SVhound using subsets of the 1000 genomes project data and showed that its correlation (average correlation of 2,800 tests r=0.7136) is high to the full data set. Next, we utilized SVhound to investigate potentially missed or understudied regions across 1KGP and CCDG that included multiple genes. Lastly we show the applicability for SVhound also on a small and novel SV call set for rhesus macaque (Macaca mulatta) and discuss the impact and choice of parameters for SVhound. Overall SVhound is a unique method to identify potential regions that harbor hidden diversity in model and non model organisms and can also be potentially used to ensure high quality of SV call sets.

Download Full-text

Impact of Accountable Care Organizations on Utilization, Care, and Outcomes: A Systematic Review

Medical Care Research and Review ◽

10.1177/1077558717745916 ◽

2017 ◽

Vol 76 (3) ◽

pp. 255-290 ◽

Cited By ~ 29

Author(s):

Brystana G. Kaufman ◽

B. Steven Spivack ◽

Sally C. Stearns ◽

Paula H. Song ◽

Emily C. O’Brien

Keyword(s):

Service Use ◽

The United States ◽

Emergency Department Visits ◽

Accountable Care Organizations ◽

Systematic Evaluation ◽

Accountable Care ◽

Public And Private ◽

Outcomes Of Care ◽

The Impact

Since 2010, more than 900 accountable care organizations (ACOs) have formed payment contracts with public and private insurers in the United States; however, there has not been a systematic evaluation of the evidence studying impacts of ACOs on care and outcomes across payer types. This review evaluates the quality of evidence regarding the association of public and private ACOs with health service use, processes, and outcomes of care. The 42 articles identified studied ACO contracts with Medicare ( N = 24 articles), Medicaid ( N = 5), commercial ( N = 11), and all payers ( N = 2). The most consistent associations between ACO implementation and outcomes across payer types were reduced inpatient use, reduced emergency department visits, and improved measures of preventive care and chronic disease management. The seven studies evaluating patient experience or clinical outcomes of care showed no evidence that ACOs worsen outcomes of care; however, the impact on patient care and outcomes should continue to be monitored.

Download Full-text