Secure Linking of Data from Population-Based Cancer Registries with Healthcare Data to Evaluate Screening Programs

Abstract Background The evaluation of population-based screening programs, like the German Mammography Screening Program (MSP), requires collection and linking data from population-based cancer registries and other sources of the healthcare system on a case- specific level. To link such sensitive data, we developed a method that is compliant with German data protection regulations and does not require written individual consent. Methods Our method combines a probabilistic record linkage on encrypted identifying data with ‘blinded anonymisation’. It ensures that all data either are encrypted or have a defined and measurable degree of anonymity. The data sources use a software to transform plain-text identifying data into a set of irreversibly encrypted person cryptograms, while the evaluation attributes are aggregated in multiple stages and are reversibly encrypted. A pseudonymisation service encrypts the person cryptograms into record assignment numbers and a downstream data-collecting centre uses them to perform the probabilistic record linkage. The blinded anonymisation solves the problem of quasi-identifiers within the evaluation data. It allows selecting a specific set of the encrypted aggregations to produce data export with ensured k-anonymity, without any plain-text information. These data are finally transferred to an evaluation centre where they are decrypted and analysed. Our approach allows creating several such generalisations, with different resulting suppression rates allowing dynamic balance information depth with privacy protection and also highlights how this affects data analysability. Results German data protection authorities approved our concept for the evaluation of the impact of the German MSP on breast cancer mortality. We implemented a prototype and tested it with 1.5 million simulated records, containing realistically distributed identifying data, calculated different generalisations and the respective suppression rates. Here, we also discuss limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like l-diversity and how to reduce the amount of manual post-processing. Conclusion Our approach enables secure linking of data from population-based cancer registries and other sources of the healthcare system. Despite some limitations, it enables evaluation of the German MSP program and can be generalised to be applicable to other projects.

Download Full-text

Record Linkage in the Cancer Registry of Tyrol, Austria

Methods of Information in Medicine ◽

10.1055/s-0038-1634018 ◽

2005 ◽

Vol 44 (05) ◽

pp. 626-630 ◽

Cited By ~ 19

Author(s):

W. Stühlinger ◽

W. Oberaigner

Keyword(s):

Cancer Registry ◽

Record Linkage ◽

Cancer Registries ◽

Patient Data ◽

Medical System ◽

Data Sources ◽

Patient Registration ◽

Probabilistic Record Linkage ◽

Linkage Method ◽

Sufficient Precision

Summary Objective: Record linkage of patient data originating from various data sources and record linkage for checking uniqueness of patient registration are common tasks for every cancer registry. In Austria, there is no unique person identifier in use in the medical system. Hence, it was necessary and the goal of this work to develop an efficient means of record linkage for use in cancer registries in Austria. Methods: We adapted the method of probabilistic record linkage to the situation of cancer registries in Austria. In addition to the customary components of this method, we also took into consideration typing errors commonly occurring in names and dates of birth. The method was implemented in a program written in DELPHITM with interfaces optimised for cancer registries. Results: Applying our record linkage method to 130,509 linkages results in 105,272 (80.7%) identical pairs. For these identical pairs, 88.9% of decisions were performed automatically and 11.1% semi-automatically. For results decided automatically, 6.9% did not have simultaneous identity of last name, first name and date of birth. For results decided semi-automatically, 48.4% did not have an identical last name, 25.6% did not have an identical date of birth and 83.1% did not have simultaneous identity of last name and date of birth and first name. Conclusions: The method implemented in our cancer registry solves all record linkage problems in Austria with sufficient precision.

Download Full-text

P1-493 Probabilistic record linkage: application in the population-based cancer registry of SAo Paulo (PBCR-SP), Brazil

Journal of Epidemiology & Community Health ◽

10.1136/jech.2011.142976g.81 ◽

2011 ◽

Vol 65 (Suppl 1) ◽

pp. A203-A203

Author(s):

S. V. Peres ◽

M. R. D. O. Latorre ◽

F. A. S. Michels ◽

C. Terra

Keyword(s):

Cancer Registry ◽

Record Linkage ◽

Population Based ◽

Sao Paulo ◽

São Paulo ◽

Probabilistic Record Linkage

Download Full-text

Privacy preserving-probabilistic record linkage to assess cancer outcomes in people living with HIV in South Africa

10.31730/osf.io/wzxbv ◽

2021 ◽

Author(s):

Julia Bohlius ◽

Lina Bartels ◽

Frédérique Chammartin ◽

Victor Olago ◽

Adrian Spoerri ◽

...

Keyword(s):

South Africa ◽

Record Linkage ◽

Cancer Control ◽

Population Based ◽

Privacy Preserving ◽

People Living With Hiv ◽

Patient Privacy ◽

Cancer Outcomes ◽

Probabilistic Record Linkage ◽

Living With Hiv

Background: Privacy-preserving probabilistic record linkage (PPPRL) methods were developed and applied in high-income countries to link records within and between organizations under strict privacy protections. PPPRL has not yet been used in African settings.Methods: We used HIV-related laboratory records from National Health Laboratory Services (NHLS) in South Africa to construct a cohort of HIV-positive patients and link them to the National Cancer Registry (NCR) with PPPRL. The study was restricted to Gauteng province from 2004 to 2014. We used records with national IDs (gold standard) to determine precision, recall, and f-measure of the linkages. We included all patients with ≥ 2 HIV-related lab records measured in the cohort and assessed the number of cancers diagnosed in people living with HIV (PLWH).Results: We included 11,480,118 HIV-related laboratory records and 664,869 cancer records in the linkage. We included 1,173,908 persons in the HIV cohort; 66.6% were female and median age at first HIV-related lab test was 33.9 years (IQR 27.4-41.3). Of the patients in the cohort, 26,348 were diagnosed with at least one cancer and 8,329 of these cancers were diagnosed before or on the date of the patient’s first HIV-related record; 18,019 were diagnosed after their first HIV-related record. For all linkages, precision, recall, and f-measures were high.Conclusion: Our study showed it is feasible to use PPPRL in an African setting to link routinely collected health records from different data sources and create a longitudinal HIV cohort with cancer outcomes while strictly protecting patient privacy. This work served as the foundation to create a nationwide population-based cohort including all South African provinces which will be used to inform cancer control programs.

Download Full-text

Assessing the accuracy of probabilistic record linkage of social and health databases in the 100 million Brazilian cohort

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.276 ◽

2017 ◽

Vol 1 (1) ◽

Cited By ~ 1

Author(s):

Marcos Barreto ◽

André Alves ◽

Samila Sena ◽

Rosemeire Fiaccone ◽

Leila Amorim ◽

...

Keyword(s):

Infant Mortality ◽

Record Linkage ◽

Similarity Index ◽

Conditional Cash Transfer ◽

Small Sample ◽

Matched Pairs ◽

Probabilistic Record Linkage ◽

First Case ◽

The Impact ◽

Gold Standards

ABSTRACT Background and aimsThe Brazilian government has several social protection programmes that select their beneficiaries based on socioeconomic information kept in the CadastroÚnico (CADU) database. The CADU will be used to build a population-based cohort of approximately 100 million individuals. Among the social programmes is the Bolsa Família (PBF), a conditional cash transfer programme that provides extra income to poor families. These two databases must be deterministically linked to individuals who have received payments from PBF between 2004 and 2012. It will be used in epidemiological studies aiming to assess the impact of PBF on the occurrence and severity of several diseases and health problems (tuberculosis, leprosy, HIV, child health etc). This cohort must be probabilistically linked with databases from the Unified Health System (SUS), such as hospitalization, notifiable diseases, mortality, and live births, in order to produce data marts (domain-specific data) to the proposed studies. Our goals comprise the validation of probabilistic record linkage methods to support this cohort setup. ApproachThis paper emphasizes the accuracy assessment of our methods based on the linkage of SIH (hospitalization), SINAN (notifications), and SIM (mortality) records to the 2011 extraction of CADU. We focused on hospitalization and notification of tuberculosis, as well infant mortality for all causes in under-4 children, for a small sample with 30,029 records (CADU). Due to the absence of gold standards, we used two approaches to assess accuracy: a clerical review and an automatic (tool-based) search. In the first case, we used different cut-off points as similarity index to calculate sensitivity and specificity, and a ROC curve to separate matched and non-matched pairs. The second approach retrieves from CADU all matched and non-matched pairs for a given individual, serving as a gold standard for validation. ResultsWe retrieved 22 linked pairs, from which 18 are true positives for infant mortality (SIM database). From SINAN, our results were 434 linked pairs with 166 true positives, and with SIH, 121 linked pairs with 34 true positives. The sensitivity of manual scan for SIM (children mortality) ranges from 44% (specificity of 100%) to 95% (specificity of 94%), with similarity indices between 0.80 and 0.97, respectively. For automatic search, we obtained a sensitivity of 69.2% and specificity of 91.8%. ConclusionOur results show the need for a continuous improvement in our linkage routines and how to consistently evaluate their accuracy in the absence of adequate gold standards.

Download Full-text

Exploring the impact of cancer registry completeness on international cancer survival differences: a simulation study

British Journal of Cancer ◽

10.1038/s41416-020-01196-7 ◽

2020 ◽

Author(s):

Therese M.-L. Andersson ◽

Mark J. Rutherford ◽

Tor Åge Myklebust ◽

Bjørn Møller ◽

Isabelle Soerjomataram ◽

...

Keyword(s):

Cancer Survival ◽

Cancer Registries ◽

Population Based ◽

Cancer Registration ◽

Death Certificates ◽

International Partnership ◽

Impact On Survival ◽

Survival Differences ◽

The Impact

Abstract Background Data from population-based cancer registries are often used to compare cancer survival between countries or regions. The ICBP SURVMARK-2 study is an international partnership aiming to quantify and explore the reasons behind survival differences across high-income countries. However, the magnitude and relevance of differences in cancer survival between countries have been questioned, as it is argued that observed survival variations may be explained, at least in part, by differences in cancer registration practice, completeness and the availability and quality of the respective data sources. Methods As part of the ICBP SURVMARK-2 study, we used a simulation approach to better understand how differences in completeness, the characteristics of those missed and inclusion of cases found from death certificates can impact on cancer survival estimates. Results Bias in 1- and 5-year net survival estimates for 216 simulated scenarios is presented. Out of the investigated factors, the proportion of cases not registered through sources other than death certificates, had the largest impact on survival estimates. Conclusion Our results show that the differences in registration practice between participating countries could in our most extreme scenarios explain only a part of the largest observed differences in cancer survival.

Download Full-text

Bridging the implementation and information gap on cancer prevention and survivorship in Europe: results from the iPAAC Joint Action

European Journal of Public Health ◽

10.1093/eurpub/ckaa165.1431 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

R De Angelis ◽

S Lipponen

Keyword(s):

Public Health ◽

Cancer Survivors ◽

Cancer Prevention ◽

Joint Action ◽

Cancer Registries ◽

Population Based ◽

Joint Analysis ◽

Cancer Information ◽

Science And Policy ◽

The Impact

Abstract Background About 40% of cancers are preventable and of those cancers about 50% are due to tobacco. Cancer prevention and early detection can reduce remarkably cancer burden and inequalities with effective strategies from the European Code Against Cancer. A better use of registries data can help to bridge the existing information gaps on cancer survivors, a dramatically growing population challenging the sustainability of public health systems in Europe. Methods Policy implementation through collaborative efforts based on cancer registry data, comprehensive policies and innovations. Promotion of standards and methods to facilitate systematic delivery of comparable indicators on cancer survivors by country in Europe. Results In cancer prevention known effective measures require European-wide comprehensive action. Population-based cancer screening programmes need continuous quality assurance and follow-up. Prevalence of short- and long-term survivors derived from a joint European dataset (EUROCARE-6, 29 countries) show that differences in survivorship are wide, consistently with demography, incidence and survival patterns. Breast, colorectal and prostatic cancers are the most frequent among all cancer survivors. Conclusions Collaboration across fields of science and policy sectors is needed to boost cancer prevention. Cancer survivors are a growing heterogeneous population to be monitored in public health to support Health Technology Assessment and survivors' care planning. Key messages Priority actions are developed within iPAAC Joint Action to encourage effective policies and implementation. Joint analysis of standardised European datasets strengthens the impact of cancer registries information. Indicators on cancer prevalence should be systematically integrated in the European Cancer Information System (ECIS).

Download Full-text

The impact of breast density notification on rescreening rates within a population-based mammographic screening program

Breast Cancer Research ◽

10.1186/s13058-021-01499-4 ◽

2022 ◽

Vol 24 (1) ◽

Author(s):

Sarah Pirikahu ◽

Helen Lund ◽

Gemma Cadby ◽

Elizabeth Wylie ◽

Jennifer Stone

Keyword(s):

Breast Cancer ◽

Breast Cancer Risk ◽

Cancer Risk ◽

Breast Density ◽

Screening Program ◽

Population Based ◽

Mammographic Screening ◽

Younger Women ◽

Screening Programs ◽

The Impact

Abstract Background High participation in mammographic screening is essential for its effectiveness to detect breast cancers early and thereby, improve breast cancer outcomes. Breast density is a strong predictor of breast cancer risk and significantly reduces the sensitivity of mammography to detect the disease. There are increasing mandates for routine breast density notification within mammographic screening programs. It is unknown if breast density notification impacts the likelihood of women returning to screening when next due (i.e. rescreening rates). This study investigates the association between breast density notification and rescreening rates using individual-level data from BreastScreen Western Australia (WA), a population-based mammographic screening program. Methods We examined 981,705 screening events from 311,656 women aged 40+ who attended BreastScreen WA between 2008 and 2017. Mixed effect logistic regression was used to investigate the association between rescreening and breast density notification status. Results Results were stratified by age (younger, targeted, older) and screening round (first, second, third+). Targeted women screening for the first time were more likely to return to screening if notified as having dense breasts (Percentunadjusted notified vs. not-notified: 57.8% vs. 56.1%; Padjusted = 0.016). Younger women were less likely to rescreen if notified, regardless of screening round (all P < 0.001). There was no association between notification and rescreening in older women (all P > 0.72). Conclusions Breast density notification does not deter women in the targeted age range from rescreening but could potentially deter younger women from rescreening. These results suggest that all breast density notification messaging should include information regarding the importance of regular mammographic screening to manage breast cancer risk, particularly for younger women. These results will directly inform BreastScreen programs in Australia as well as other population-based screening providers outside Australia who notify women about breast density or are considering implementing breast density notification.

Download Full-text

Text messaging as a tool to improve cancer screening programs (M-TICS Study): A randomized controlled trial protocol

PLoS ONE ◽

10.1371/journal.pone.0245806 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0245806

Author(s):

Nuria Vives ◽

Albert Farre ◽

Gemma Ibáñez-Sanz ◽

Carmen Vidal ◽

Gemma Binefa ◽

...

Keyword(s):

Randomized Controlled Trial ◽

Cancer Screening ◽

Screening Program ◽

Controlled Trial ◽

Population Based ◽

Routine Practice ◽

Screening Programs ◽

Crc Screening ◽

Randomized Controlled ◽

The Impact

Background Short message service (SMS) based interventions are widely used in healthcare and have shown promising results to improve cancer screening programs. However, more research is still needed to implement SMS in the screening process. We present a study protocol to assess the impact on health and economics of three targeted SMS-based interventions in population-based cancer screening programs. Methods/Design The M-TICs study is a randomized controlled trial with a formal process evaluation. Participants aged 50–69 years identified as eligible from the colorectal cancer (CRC) and breast cancer (BC) screening program of the Catalan Institute of Oncology (Catalonia, Spain) will be randomly assigned to receive standard invitation procedure (control group) or SMS-based intervention to promote participation. Two interventions will be conducted in the CRC screening program: 1) Screening invitation reminder: Those who do not participate in the CRC screening within 6 weeks of invite will receive a reminder (SMS or letter); 2) Reminder to complete and return fecal immunochemical test (FIT) kit: SMS reminder versus no intervention to individuals who have picked up a FIT kit at the pharmacy and they have not returned it after 14 days. The third intervention will be performed in the BC screening program. Women who had been screened previously will receive an SMS invitation or a letter invitation to participate in the screening. As a primary objective we will assess the impact on participation for each intervention. The secondary objectives will be to analyze the cost-effectiveness of the interventions and to assess participants’ perceptions. Expected results The results from this randomized controlled trial will provide important empirical evidence for the use of mobile phone technology as a tool for improving population-based cancer screening programs. These results may influence the cancer screening invitation procedure in future routine practice. Trial registration Registry: NCT04343950 (04/09/2020); clinicaltrials.gov.

Download Full-text

Computerized cancer registries solutions - a systematic review (Preprint)

10.2196/preprints.18254 ◽

2020 ◽

Author(s):

Cátia Santos-Pereira ◽

Alexandre B. Augusto ◽

Ricardo Cruz-Correia ◽

Manuel E. Correia

Keyword(s):

Systematic Review ◽

Quality Control ◽

Data Protection ◽

Data Privacy ◽

Cancer Registries ◽

Population Based ◽

Data Quality Control ◽

Inclusion Criteria ◽

Privacy And Security ◽

General Data Protection Regulation

BACKGROUND A cancer registry (CR) is typically a standardized tool to produce population-based data on cancer incidence and survival. Cancer registries aim to retrieve and store information on all cancer cases occurring in a defined population. The main sources of data on cancer cases usually include treatment, diagnostic facilities (oncology centres or hospital departments, pathology laboratories, or imaging facilities etc.) and the official territorial death registry. OBJECTIVE The aim of this study is to assess the actual solutions for cancer registries and determine and understand its main requirements. METHODS To achieve this goal, we have made a systematic review based on a comprehensive qualitative research, following the PRISMA statement framework. Four distinct databases were searched: Medline; ISI Web of Knowledge, IEEE Xplore and Scopus with the query “cancer registries” [All Fields] AND computerized [All Fields]. The inclusion criteria include references from five key-concepts: data collection; standards; quality control, data protection and data exploration. For the process final review, we have involved the participation of three medical informatics professionals. RESULTS From a total of 54 articles, 10 accomplished the inclusion criteria and were included in the analysis. Cancer registries systems had in general problems related to the lack of a fully automatic integration of data from different sources, difficulty in automatize data quality control routines and a lack of harmonization in terms of standards (both communication and terminologies standards). Many tasks are still performed manually implying an extra effort from the human resources team that results in a substantial delay in survival and incidence reports production and more data inconsistencies and errors. CONCLUSIONS It is essential to automatize the data linking integration between different healthcare institutions. However, it is important to consider a balance between the preservation of data integrity and the patient’s privacy, whilst enabling meaningful state of the art continuous research to improve people’s health and the general quality of care. Healthcare institutions must abide and comply with the changes imposed by the much more stringent data privacy protection regulations imposed by the GDPR (General Data Protection Regulation), resulting on new rigorous compliance obligations on privacy and security that all CRs across Europe must be ready to comply.

Download Full-text

Long-term non-cancer mortality among 39,657 one-year testicular cancer survivors (TCSs)

Journal of Clinical Oncology ◽

10.1200/jco.2006.24.18_suppl.4508 ◽

2006 ◽

Vol 24 (18_suppl) ◽

pp. 4508-4508 ◽

Cited By ~ 3

Author(s):

S. D. Fosså ◽

J. Chen ◽

G. M. Dores ◽

K. A. McGlynn ◽

S. J. Schonfeld ◽

...

Keyword(s):

General Population ◽

Cancer Registries ◽

Population Based ◽

Duodenal Ulcers ◽

Digestive Diseases ◽

One Year ◽

Few Data ◽

The Impact ◽

Circulatory Diseases

4508 Background: Multiple reports address the incidence of second cancer (SC) and long-term morbidity in TCSs, yet few data analyze the impact of non-malignant late sequelae on mortality. Methods: 39,657 one-year TCSs were identified in 14 population-based cancer registries in North America and Europe, with 17,856, 13,084 and 6,298 men followed for 10, 20 and 30 years, respectively. Standardized mortality ratios (SMRs), comparing TCSs to the general population, were calculated for deaths due to all non-cancer causes (n = 2,942) and specific sites. Further, absolute mortality due to TC, non-TC SC and all non-cancer disorders was estimated. Results: The SMR for all non-malignant diseases combined was 0.99 (95% CI: 0.95–1.02), with a significant reduction of deaths due to circulatory diseases (SMR: 0.92, n = 1,117). However, following initial treatment with chemotherapy and radiotherapy, the SMR for circulatory diseases was significantly elevated (SMR: 1.76), with a non-significant 29% excess after chemotherapy alone. Mortality due to digestive diseases was significantly increased (SMR: 1.32, n = 222), including gastric and duodenal ulcers (SMR = 1.52; excess deaths were observed between 10 and 25 years after initial radiotherapy). For the first 20 years after TC diagnosis, deaths due to infection were significantly elevated (SMR: 1.52, n=211). Absolute mortality due to non-cancer disorders always exceeded that due to SC, and was 15% after 30 years in a TCS diagnosed at age 35 compared with about 11% for SC. Conclusions: Compared with the general population, the overall risk of mortality due to all non-cancer causes combined does not appear to be increased in TCSs. However, they experience excess non-cancer deaths due to infection and digestive diseases, but not circulatory diseases. Additional analytic studies with detailed data on treatment and co-morbidities are required to further evaluate associations with specific causes of death. No significant financial relationships to disclose.

Download Full-text