Record Linkage in the Cancer Registry of Tyrol, Austria

2005 ◽  
Vol 44 (05) ◽  
pp. 626-630 ◽  
Author(s):  
W. Stühlinger ◽  
W. Oberaigner

Summary Objective: Record linkage of patient data originating from various data sources and record linkage for checking uniqueness of patient registration are common tasks for every cancer registry. In Austria, there is no unique person identifier in use in the medical system. Hence, it was necessary and the goal of this work to develop an efficient means of record linkage for use in cancer registries in Austria. Methods: We adapted the method of probabilistic record linkage to the situation of cancer registries in Austria. In addition to the customary components of this method, we also took into consideration typing errors commonly occurring in names and dates of birth. The method was implemented in a program written in DELPHITM with interfaces optimised for cancer registries. Results: Applying our record linkage method to 130,509 linkages results in 105,272 (80.7%) identical pairs. For these identical pairs, 88.9% of decisions were performed automatically and 11.1% semi-automatically. For results decided automatically, 6.9% did not have simultaneous identity of last name, first name and date of birth. For results decided semi-automatically, 48.4% did not have an identical last name, 25.6% did not have an identical date of birth and 83.1% did not have simultaneous identity of last name and date of birth and first name. Conclusions: The method implemented in our cancer registry solves all record linkage problems in Austria with sufficient precision.

Author(s):  
Yinghao Zhang ◽  
Senlin Xu ◽  
Mingfan Zheng ◽  
Xinran Li

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.


2018 ◽  
Vol 4 (Supplement 2) ◽  
pp. 65s-65s
Author(s):  
T. Gillespie ◽  
P. Dhillon ◽  
K. Ward ◽  
A. Aggarwal ◽  
D. Bumb ◽  
...  

Background: Cancer registries worldwide are vital to determine cancer burden, plan cancer control measures, and facilitate research. Population-based cancer registries are a priority for LMICs by the UICC; the National Cancer Registry Program (NCRP) of India oversees 28 such registries. A primary function of registries is to combine data for the same individual from multiple sources. For other disease cohorts where cancer is an outcome of interest, registries can potentially connect information by linking datasets together. Barriers to successful registration and linkages include systems in which cancer is not a notifiable disease, no universal unique individual identifier exists, and lack of trained personnel. This study utilizes technology and infrastructure to develop better linkages, surveillance, and outcomes. Aim: To assess the feasibility of linking large cohorts designed for cardio-metabolic disease research with cancer registries in New Delhi and Chennai; determine additional steps required for linkage accuracy and completeness; and develop detailed protocols for future applications. Methods: A pilot protocol for linkage between a large diabetes cohort and cancer registries in Delhi and Chennai was developed using MatchPro, a probabilistic record linkage program developed for cancer registries. Probabilistic software links datasets together in the presence of uncertainty (eg misspelled or abbreviated names) to identify record pairs with high probability of representing the same individual. For this study, algorithms were developed to address unique aspects of names and demographics in India. The software and algorithms focused on: detecting duplicates in cancer registries; and linking registries with external files from diabetes cohorts. In Delhi, 3 1-year datasets covering 3 years (2010, 2011, 2012) were linked with the diabetes cohort; in Chennai, the linkage included 3 5-year datasets covering 15 years (2000-04, '05-'09, '10-'14). The unique ID (Aadhaar) is not collected or linked systematically between different systems at this point in time. Results: Linkage attempts yielded potential matches ranked according to probabilistic scores; highest scores were reviewed to determine true matches. In Chennai, this process yielded: (2010-2014) 21% self-reported (SR) cases matching perfectly, 36% requiring follow-up, 13 nonreported (NR) cases found; 2005-2009: 33% SR cases matched perfectly, 1 NR case found; 2000-2004: 1 NR case. Also, 2 training workshops on data linkages and software were held. Conclusion: Linkages between cancer registries and other data sources are feasible in LMICs using probabilistic record linkage software augmented by manual matching. Future efforts to use existing epidemiologic resources (cohorts) and cancer research infrastructure (registries and clinical centers) can enhance research including understanding shared risk factors and pathophysiologic mechanisms e.g., between cancer and other NCD.


2017 ◽  
Vol 25 (1) ◽  
pp. 149-160 ◽  
Author(s):  
Giovanni Benedetto ◽  
Alessia Di Prima ◽  
Salvatore Sciacca ◽  
Giuseppe Grosso

We described the design of a web-based application (the Software Integrated Cancer Registry—SWInCaRe) used to administer data in a cancer registry and tested its validity and usability. A sample of 11,680 records was considered to compare the manual and automatic procedures. Sensibility and specificity, the Health IT Usability Evaluation Scale, and a cost-efficiency analysis were tested. Several data sources were used to build data packages through text-mining and record linkage algorithms. The automatic procedure showed small yet measurable improvements in both data linkage process and cancer cases estimation. Users perceived the application as useful to improve the time of coding and difficulty of the process: both time and cost-analysis were in favor of the automatic procedure. The web-based application resulted in a useful tool for the cancer registry, but some improvements are necessary to overcome limitations observed and to further automatize the process.


Author(s):  
Dennis O. Laryea ◽  
Fred K. Awittor

ObjectiveTo discuss the implementation of confidentiality practices at theKumasi Cancer Registry.IntroductionCancer registration involves collecting information on patientswith cancer. Population-based cancer registries in particular areuseful in estimating the disease burden and to inform the institutionof prevention and control measures. Collecting personal informationon patients with cancer requires strict adherence to principles ofconfidentiality to ensure the safety of the collected data. Failure mayhave legal and medical implications. The Kumasi Cancer Registrywas established as a population-based cancer Registry in 2012. Theregistry collects data on cases of cancer occurring among residentsof the Kumasi Metropolitan area of Ghana. Issues bordering onconfidentiality were an integral part of the establishment of theregistry. We discuss the implementation of confidentiality plansduring the four years of existence of the Kumasi Cancer Registry.MethodsThe registry has a designed abstraction form which is used to collectdata. Data sources for the Registry are all major hospitals in Kumasiproviding cancer treatment services. Data sources also include privatepathology laboratories and the Births and Deaths Registry. Trainedresearch assistants collect data from the folders of patients. This isfollowed by coding and then entering into the Canreg 5 software.Coded and entered into the Canreg5 software for management andanalysis. After data entry, the forms are filed in order of registrynumbers as generated by the canreg5 software for easy reference.ResultsConfidentiality of KsCR data is ensured through the followingmeasures. The signing of a confidentiality agreement by all registrystaff. The confidentiality agreement spells out terms for the releaseof data to third parties in particular but even staff of the variousfacilities. The agreement also spells out the consequences of a breachof any of the clauses. No direct contact is made with patients duringthe process of abstraction of data by registrars. The data abstractionforms are kept in a secured safe in the registry office. The computersthat house the registry data are password enabled and are changedon a regular basis to ensure security. The Canreg5 software usedfor electronic data management also has individual profiles withpasswords for all registrars and supervisors. The scope of accessto Canreg data is limited by the profile status of the respectivestaff members. Supervisors have full access to all data includingsummarized reports. Registrars have limited access mostly restrictedto data entry. Access to the registry office is restricted to registry staffand other personnel authorized by the Registry Manager or Director.An established Registry Advisory Board is responsible for assessingrequests and approval of data from the registry. Where files have tobe sent electronically, they are password protected and sent in severalparts in separate emails.ConclusionsDespite the potential challenges to maintaining confidentialityof data in developing outcries, evidence from four years of cancerdata management in Kumasi suggests stringent measure can ensureconfidentiality. The use of multiple measures to ensure confidentialityis essential in surveillance data management


2019 ◽  
Vol 82 (S 02) ◽  
pp. S131-S138
Author(s):  
Sebastian Bartholomäus ◽  
Yannik Siegert ◽  
Hans Werner Hense ◽  
Oliver Heidinger

Abstract Background The evaluation of population-based screening programs, like the German Mammography Screening Program (MSP), requires collection and linking data from population-based cancer registries and other sources of the healthcare system on a case- specific level. To link such sensitive data, we developed a method that is compliant with German data protection regulations and does not require written individual consent. Methods Our method combines a probabilistic record linkage on encrypted identifying data with ‘blinded anonymisation’. It ensures that all data either are encrypted or have a defined and measurable degree of anonymity. The data sources use a software to transform plain-text identifying data into a set of irreversibly encrypted person cryptograms, while the evaluation attributes are aggregated in multiple stages and are reversibly encrypted. A pseudonymisation service encrypts the person cryptograms into record assignment numbers and a downstream data-collecting centre uses them to perform the probabilistic record linkage. The blinded anonymisation solves the problem of quasi-identifiers within the evaluation data. It allows selecting a specific set of the encrypted aggregations to produce data export with ensured k-anonymity, without any plain-text information. These data are finally transferred to an evaluation centre where they are decrypted and analysed. Our approach allows creating several such generalisations, with different resulting suppression rates allowing dynamic balance information depth with privacy protection and also highlights how this affects data analysability. Results German data protection authorities approved our concept for the evaluation of the impact of the German MSP on breast cancer mortality. We implemented a prototype and tested it with 1.5 million simulated records, containing realistically distributed identifying data, calculated different generalisations and the respective suppression rates. Here, we also discuss limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like l-diversity and how to reduce the amount of manual post-processing. Conclusion Our approach enables secure linking of data from population-based cancer registries and other sources of the healthcare system. Despite some limitations, it enables evaluation of the German MSP program and can be generalised to be applicable to other projects.


Author(s):  
Mazvita Sengayi ◽  
Adrian Spörri ◽  
Eliane Rohner ◽  
Michael Vinikoor ◽  
Hans Prozesky ◽  
...  

ABSTRACT BackgroundSub-Saharan Africa is the region most heavily affected by the HIV/AIDS epidemic. HIV increases the risk of developing cancer but the ascertainment of cancers in patients attending antiretroviral therapy (ART) treatment programs might be incomplete. To estimate the under-ascertainment of cancer we compared incidence rates of AIDS-defining cancers in South African HIV cohorts with and without cancer case ascertainment through record linkage with the National Cancer Registry. MethodsWe used the data of adult (≥16 years) HIV-positive persons receiving care between 2004 and 2011 at one of four ART programs in South Africa. These programs collaborate with the International Epidemiologic Databases to Evaluate AIDS Southern Africa (www.iedea-sa.org) and collected data for AIDS-defining cancers but not for other cancers. To improve cancer ascertainment we probabilistically linked patient records (using first name, surname, age, and gender) from two HIV cohorts with the cancer records of the South African National Cancer Registry. We calculated incidence rates per 100,000 person-years after starting ART for the AIDS-defining cancers, i.e. Kaposi sarcoma (KS), invasive cervical cancer (ICC) and non-Hodgkin lymphoma (NHL). We compared incidence rates before and after inclusion of record linkage identified cancer cases using the attributable fraction of cancers identified with 95% confidence intervals (CI). ResultsA total of 49,207 adults starting ART in South Africa were included. 65% of patients were female, median age at starting ART was 35 years (interquartile range 30-41 years). We identified a total of 471 incident cancer cases. With record linkage the incidence increased from 81 to 292 for KS, from 1 to 119 for NHL and 12 to 497 for ICC per 100,000 person-years. The attributable fraction of cancers identified was 72% (95% CI 63-79%) for KS, 98% (95% CI 94-99%) for NHL and 98% (95% CI 95-99%) for ICC. ConclusionAscertainment of cancer in HIV program data in African settings is incomplete. This case study has shown that probabilistic record linkage to cancer registries is both feasible and essential for cancer ascertainment in HIV cohorts in South Africa.


Sign in / Sign up

Export Citation Format

Share Document