scholarly journals Comparing Record Linkage methods for real-world perinatal and neonatal data without unique identifiers

Author(s):  
Rainer Schnell ◽  
Christian Borgs

BackgroundData on newborns is regularly linked for epidemiological research. However, hospital data often suffers from incomplete data. We report on a linkage of two population-covering administrative health databases containing neonatal and perinatal data without unique personal identifiers and with incomplete information in standard patient identifiers. GoalTo study the effects of a policy-induced change from linking a national database without standard patient identifiers to a privacy-preserving Record Linkage method, we compare the linkage system in use to clear-text and privacy-preserving Record Linkage techniques. We expected large proportions of missing identifiers since they are not needed for clinical practice. Therefore, we expected missing links caused by missing identifiers. To study the impact of these missing identifiers on these successful links, we compared several linkage methods. Furthermore, we study the variations of linkage success between hospitals. MethodsPerinatal and neonatal data from population-covering real-world administrative databases was linked using several variants of state of the art methods, including Privacy-preserving Record Linkage (PPRL) techniques such as multiple match keys and Bloom filter methods. Results We report on the variation of linkage results between the hospitals and give possible explanations for the differences. The resulting linkage success is reported for each method. The impact of incomplete data on linkage success for each method is documented. Finally, we report on the relative performance of the modified techniques compared to standard linkage procedures used in practice. ConclusionImplementing a record linkage system based on identifiers not required for clinical practice caused a large number of missing identifiers. Since this information is essential for successful clear-text and private linkage methods, emphasizing the need for documenting patient identifiers, especially in cases where auxiliary information (such as stable addresses, date of birth or health insurance numbers) are missing, is of central importance for implementing a privacy-preserving Record Linkage system.

2021 ◽  
Vol 8 (1) ◽  
pp. e000840
Author(s):  
Lianne Parkin ◽  
Sheila Williams ◽  
David Barson ◽  
Katrina Sharples ◽  
Simon Horsburgh ◽  
...  

BackgroundCardiovascular comorbidity is common among patients with chronic obstructive pulmonary disease (COPD) and there is concern that long-acting bronchodilators (long-acting muscarinic antagonists (LAMAs) and long-acting beta2 agonists (LABAs)) may further increase the risk of acute coronary events. Information about the impact of treatment intensification on acute coronary syndrome (ACS) risk in real-world settings is limited. We undertook a nationwide nested case–control study to estimate the risk of ACS in users of both a LAMA and a LABA relative to users of a LAMA.MethodsWe used routinely collected national health and pharmaceutical dispensing data to establish a cohort of patients aged >45 years who initiated long-acting bronchodilator therapy for COPD between 1 February 2006 and 30 December 2013. Fatal and non-fatal ACS events during follow-up were identified using hospital discharge and mortality records. For each case we used risk set sampling to randomly select up to 10 controls, matched by date of birth, sex, date of cohort entry (first LAMA and/or LABA dispensing), and COPD severity.ResultsFrom the cohort (n=83 417), we identified 5399 ACS cases during 281 292 person-years of follow-up. Compared with current use of LAMA therapy, current use of LAMA and LABA dual therapy was associated with a higher risk of ACS (OR 1.28 (95% CI 1.13 to 1.44)). The OR in an analysis restricted to fatal cases was 1.46 (95% CI 1.12 to 1.91).ConclusionIn real-world clinical practice, use of two versus one long-acting bronchodilator by people with COPD is associated with a higher risk of ACS.


JAMIA Open ◽  
2019 ◽  
Vol 2 (4) ◽  
pp. 562-569 ◽  
Author(s):  
Jiang Bian ◽  
Alexander Loiacono ◽  
Andrei Sura ◽  
Tonatiuh Mendoza Viramontes ◽  
Gloria Lipori ◽  
...  

Abstract Objective To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. Materials and Methods We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules’ performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool—OneFL Deduper—that (1) creates seeded hash codes of combinations of patients’ quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. Results We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. Conclusions Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources.


2019 ◽  
Vol 17 (3.5) ◽  
pp. HSR19-085
Author(s):  
Belqis El Ferjani ◽  
Sheenu Chandwani ◽  
Meita Hirschmann ◽  
Seydeh Dibaj ◽  
Emily Roarty ◽  
...  

Background: NSCLC is the leading cause of cancer-related mortality worldwide. Recently reported clinical trials have firmly established the role of PD-1 and PD-L1 inhibitors in the treatment of patients (pts) with metastatic NSCLC (mNSCLC). We have established the prospective, observational, real-world Advanced Non-Small Cell Lung Holistic Registry (ANCHoR) to understand how the advent of immunotherapy impacts treatment choices and clinical outcomes. Objectives: The aim of this analysis is to measure the impact of immunotherapy on the treatment choice for the first-line treatment of mNSCLC and to determine the link between PD-L1 expression and the treatment choices made in routine clinical practice at the MD Anderson Cancer Center (MDA). Methods: From May 1, 2017, to June 30, 2018, English-speaking pts with mNSCLC at MDA who provided written informed consent were enrolled in ANCHoR and longitudinally followed. The PD-L1 testing rates were captured and the treatment decisions made were also captured and tabulated. The time of data cutoff for this study is June 30, 2018. Results: Of the 296 pts enrolled in the registry at the time of data cutoff, there were 49.7% males, 82.1% white, 45.9% ≥65 years old, 69.3% smokers, 83.1% with an initial stage IV diagnosis, 87.2% with nonsquamous histology, 36.1% with bone metastasis, 29.4% with brain metastasis, 43.2% with 0–1 performance status, and 21.6% with a known EGFR or ALK mutation. A total of 233 pts had been tested for PD-L1 (78.7%). Predominant reasons for not testing (63 pts) include not having available tissue (26 pts) or the test was not requested by the physician (31 pts). As of June 30, 2018, 38.5% of patients received immunotherapy as first-line therapy either as a single agent (18.9%, 56 pts) or in combination with chemotherapy (19.6%, 58 pts). Only 35.8% of the patients received platinum doublet chemotherapy alone. Two pts received chemotherapy combined with an anti-angiogenesis agent (0.68%). Targeted therapy was utilized either as a single agent (20.6%) or in combination with immunotherapy (2.4%). Conclusion: Immunotherapy is now utilized as a single agent or in combination in more than one-third of patients with mNSCLC. These numbers are expected to increase as data from recently reported studies get incorporated into common clinical practice. Compared to historic experience, there has been a dramatic decline in the use of chemotherapy with an anti-angiogenesis agent.


Author(s):  
Thilina Ranbaduge ◽  
Dinusha Vatsalan ◽  
Sean Randall ◽  
Peter Christen

ABSTRACT ObjectiveThe linking of multiple (three or more) health databases is challenging because of the increasing sizes of databases, the number of parties among which they are to be linked, and privacy concerns related to the use of personal data such as names, addresses, or dates of birth. This entails a need to develop advanced scalable techniques for linking multiple databases while preserving the privacy of the individuals they contain. In this study we empirically evaluate several state-of-the-art multi-party privacy-preserving record linkage (MP-PPRL) techniques with large real-world health databases from Australia. ApproachMP-PPRL is conducted such that no sensitive information is revealed about database records that can be used to infer knowledge about individuals or groups of individuals. Current state-of-the-art methods used in this evaluation use Bloom filters to encode personal identifying information. The empirical evaluation comprises of different multi-party private blocking and matching techniques that are evaluated for different numbers of parties. Each database contains more than 700,000 records extracted from ten years of New South Wales (NSW) emergency presentation data. Each technique is evaluated with regard to scalability, quality and privacy. Scalability and quality are measured using the metrics of reduction ratio, pairs completeness, precision, recall, and F-measure. Privacy is measured using disclosure risk metrics that are based on the probability of suspicion, defined as the likelihood that a record in an encoded database matches to one or more record(s) in a publicly available database such as a telephone directory. MP-PPRL techniques that either utilize a trusted linkage unit, and those that do not, are evaluated. ResultsExperimental results showed MP-PPRL methods are practical for linking large-scale real world data. Private blocking techniques achieved significantly higher privacy than standard hashing-based techniques with a maximum disclosure risk of 0.0003 and 1, respectively, at a small cost to linkage quality and efficiency. Similarly, private matching techniques provided a similar acceptable reduction in linkage quality compared to standard non-private matching while providing high privacy protection. ConclusionThe adoption of privacy-preserving linkage methods has the ability to significantly reduce privacy risks associated with linking large health databases, and enable the data linkage community to offer operational linkage services not previously possible. The evaluation results show that these state-of-the-art MP-PPRL techniques are scalable in terms of database sizes and number of parties, while providing significantly improved privacy with an associated trade-off in linkage quality compared to standard linkage techniques.


Author(s):  
James Boyd ◽  
Anna Ferrante ◽  
Adrian Brown ◽  
Sean Randall ◽  
James Semmens

ABSTRACT ObjectivesWhile record linkage has become a strategic research priority within Australia and internationally, legal and administrative issues prevent data linkage in some situations due to privacy concerns. Even current best practices in record linkage carry some privacy risk as they require the release of personally identifying information to trusted third parties. Application of record linkage systems that do not require the release of personal information can overcome legal and privacy issues surrounding data integration. Current conceptual and experimental privacy-preserving record linkage (PPRL) models show promise in addressing data integration challenges but do not yet address all of the requirements for real-world operations. This paper aims to identify and address some of the challenges of operationalising PPRL frameworks. ApproachTraditional linkage processes involve comparing personally identifying information (name, address, date of birth) on pairs of records to determine whether the records belong to the same person. Designing appropriate linkage strategies is an important part of the process. These are typically based on the analysis of data attributes (metadata) such as data completeness, consistency, constancy and field discriminating power. Under a PPRL model, however, these factors cannot be discerned from the encrypted data, so an alternative approach is required. This paper explores methods for data profiling, blocking, weight/threshold estimation and error detection within a PPRL framework. ResultsProbabilistic record linkage typically involves the estimation of weights and thresholds to optimise the linkage and ensure highly accurate results. The paper outlines the metadata requirements and automated methods necessary to collect data without compromising privacy. We present work undertaken to develop parameter estimation methods which can help optimise a linkage strategy without the release of personally identifiable information. These are required in all parts of the privacy preserving record linkage process (pre-processing, standardising activities, linkage, grouping and extracting). ConclusionsPPRL techniques that operate on encrypted data have the potential for large-scale record linkage, performing both accurately and efficiently under experimental conditions. Our research has advanced the current state of PPRL with a framework for secure record linkage that can be implemented to improve and expand linkage service delivery while protecting an individual’s privacy. However, more research is required to supplement this technique with additional elements to ensure the end-to-end method is practical and can be incorporated into real-world models.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e23521-e23521
Author(s):  
Margherita Nannini ◽  
Alessandro Rizzo ◽  
Maria Concetta Nigro ◽  
Bruno Vincenzi ◽  
Alessandro Mazzocca ◽  
...  

e23521 Background: Regorafenib (REG) is a multikinase inhibitor approved as third-line treatment in gastrointestinal stromal tumors (GIST). Although its proven activity, REG can present a relevant adverse profile which often leads to treatment modifications and transient or permanent discontinuation; thus, in clinical practice physicians usually adopt various dosing and interval schedules to counteract REG-related adverse events (AEs) and avoid treatment interruption. The aim of this real-world study was to investigate the efficacy and safety of personalized schedules of REG in metastatic GIST patients, in comparison with the standard schedule (160 mg daily, 3-weeks-on, 1-week-off schedule). Methods: Institutional registries across seven Italian reference centers were retrospectively reviewed and data of interest retrieved to identify GIST patients who had received REG from February 2013 to January 2021. The primary endpoint was Progression-Free Survival (PFS), with Overall Survival (OS) also assessed as secondary endpoint. The Kaplan-Meier method was used to estimate survival and the log-rank test to make comparisons. The impact of variables on survival was assessed through univariate and multivariate analysis. Results: A total of 152 GIST patients (82 male and 70 female) were included and split in two groups on the basis of the REG treatment plan received (standard vs personalized). Among the 103 patients for whom the treatment was personalized (38 since the beginning and 65 during the treatment course), the main strategies adopted were the following: 120 mg/day d1-21 e28 (n = 56; 54.4%); 80 mg/day d1-21 e28 (n = 22; 21.4%); 160 mg/day d1-5 e7 (n = 13; 12.6%). At a median follow-up of 36.5 months, median Overall Survival (OS) was 16.6 months (95% CI 14.1-21.8) and 20.5 months (95% CI 15.0-25.4) in the standard-dose and the personalized schedule groups, respectively (HR 0.75; 95% CI 0.49-1.22; p = 0.16). Median Progression-Free Survival (PFS) was 5.6 months (95% CI 3.3-not reached) and 9.7 months (95% CI 7.9-14.5) in the same groups (HR 0.51; 95% CI 0.34-0.75; p = 0.00052). Conclusions: Despite the expected limits of a retrospective analysis, we confirm that REG personalized schedules are commonly adopted in everyday clinical practice of high-volume GIST expert centers and correlate with significant improvement of therapeutic outcomes. Based on these results, REG treatment optimization in GIST patients may represent the best strategy to maximize long-term therapy, preserving tolerability and quality of life.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 4428-4428
Author(s):  
Manuela Hoechstetter ◽  
Philipp Eissmann ◽  
Nike Hucke ◽  
Anna van Troostenburg ◽  
Heribert Ramroth ◽  
...  

Abstract Introduction: Idelalisib is a first-in-class PI3Kδ-inhibitor. In clinical studies idelalisib demonstrated significant efficacy in patients with CLL, including patients with TP53 aberrations (del17p and/or TP53m). On this basis national and international guidelines in Europe recommend idelalisib as one treatment option in this high-risk CLL patient population. However, it is unclear how efficacy reported in clinical studies translates into real world experience. Considering the importance of such data, we initiated a real world study soon after market authorization of idelalisib in the European Union prospectively investigating efficacy and safety of idelalisib in routine clinical practice. Concomitant PJP prophylaxis is a risk minimization measure that was introduced after market authorization of idelalisib. Nevertheless, the impact on patient outcomes in routine clinical practice has not been studied in detail. We therefore also analyzed the impact of PJP prophylaxis on overall survival (OS) within this real world cohort. Methods: A prospective, two-cohort, multicenter, non-interventional post-authorization safety study (PASS) reporting real world safety and efficacy data on the use of idelalisib in Germany. Inclusion of patients was based on the physician's decision to initiate treatment with idelalisib in accordance with the European Summary of Product Characteristics. Descriptive statistics were used for data analysis. Results: This analysis included 84 CLL patients with a median age of 74 years. 88% of patients were older than 65 years, 70% were male and 86% presented with one or more co-morbidities. Binet stage A, B, C was reported in 25%, 33% and 37% of patients, respectively. The median time from diagnosis to start of idelalisib therapy was 89.5 months and patients received a median number of two prior lines of therapy, including treatment with the BTK inhibitor ibrutinib in 11 patients (13%). With a median observation time of 11.5 months the median overall survival (OS) for the entire CLL patient population was not reached. Our CLL cohort included 24 patients (29%) that did not receive PJP prophylaxis for idelalisib therapy. We therefore compared OS in patients with and without concomitant PJP prophylaxis. In patients receiving PJP prophylaxis survival rates were higher in the first 6-12 months of therapy, with 6-month and 12-month survival rates in patients with vs without PJP prophylaxis of 98% vs 76% and 84% vs 76%, respectively (Figure 1). 19% of CLL patients (n=16) had documented TP53 aberrations, including five patients that received idelalisib as first-line treatment. In patients with TP53 aberrations the overall response rate (ORR) was 77% compared to 67% in patients without TP53 aberrations. Importantly, the 12-month survival rates for patients with and without TP53 aberrations were similar with 81% and 83%, respectively. Median OS was not reached for either patient population (Figure 2). Conclusion: This prospective real world study started collecting data on the efficacy and safety of idelalisib in routine clinical practice soon after market authorization of idelalisib in Europe. Results demonstrate similar efficacy of idelalisib irrespective of the patients' TP53 status confirming the efficacy previously reported in pivotal clinical studies. Additionally, our results provide evidence that PJP prophylaxis is an effective risk minimization measure impacting on survival. Disclosures Hoechstetter: Hexal: Other: Travel Grants; Abbvie: Other: Travel Grants; Gilead Sciences: Consultancy, Other: Travel Grants. Eissmann:Gilead Sciences: Employment. Hucke:Gilead Sciences: Employment. van Troostenburg:Gilead Sciences: Employment. Ramroth:Gilead Sciences: Employment. Knauf:Celgene: Consultancy, Honoraria; Gilead Sciences: Consultancy; Janssen: Consultancy; Mundipharma: Consultancy; Roche: Consultancy; Amgen: Consultancy, Honoraria; AbbVie: Consultancy.


2007 ◽  
Vol 46 (04) ◽  
pp. 420-424 ◽  
Author(s):  
W. Oberaigner

Summary Objective: It was the objective of this study to assess the impact of applying various record linkage methods to one of the most important outcome measures in oncological epidemiology, namely survival rates. Methods: To assess the life status of patients, incidence data published by the Cancer Registry of Tyrol were analyzed with three routinely used methods of record linkage for incidence and mortality data. Of these methods, two were deterministic and the third a probabilistic method developed by the Cancer Registry of Tyrol. We studied the impact of record linkage methods on a simple measure (mortality rate) and a more complex measure (relative survival rate). The analysis was based on the published incidence data for Tyrol for the years 1992 to 1996. Results of deterministic record linkage methodswere simulated. Results: The error rates for simple mortality rate and relative survival rate are considerable. For the first deterministic record linkage method, relative differences in mortality rate range from 11.9% to 14.8% (men) and 24.5% to 28.2% (women) and relative differences in relative five-year survival from 11.4% to 16.3% (men) and from 19.3% to 26.4% (women). For the second deterministic record linkage method, relative differences in mortality rate range from 4.8% to 5.9% (men) and from 4.9% to 7.4% (women), while relative differences in relative five-year survival range from 5.1% to 7.0% (men) and from 4.4% to 6.1% (women). Conclusions: Our study shows that in order to calculate valid mortality and survival rates a probabilistic method of record linkage must be applied.


Sign in / Sign up

Export Citation Format

Share Document