scholarly journals Web-based interactive mapping from data dictionaries to ontologies, with an application to cancer registry

2020 ◽  
Vol 20 (S10) ◽  
Author(s):  
Shiqiang Tao ◽  
Ningzhou Zeng ◽  
Isaac Hands ◽  
Joseph Hurt-Mueller ◽  
Eric B. Durbin ◽  
...  

Abstract Background The Kentucky Cancer Registry (KCR) is a central cancer registry for the state of Kentucky that receives data about incident cancer cases from all healthcare facilities in the state within 6 months of diagnosis. Similar to all other U.S. and Canadian cancer registries, KCR uses a data dictionary provided by the North American Association of Central Cancer Registries (NAACCR) for standardized data entry. The NAACCR data dictionary is not an ontological system. Mapping between the NAACCR data dictionary and the National Cancer Institute (NCI) Thesaurus (NCIt) will facilitate the enrichment, dissemination and utilization of cancer registry data. We introduce a web-based system, called Interactive Mapping Interface (IMI), for creating mappings from data dictionaries to ontologies, in particular from NAACCR to NCIt. Method IMI has been designed as a general approach with three components: (1) ontology library; (2) mapping interface; and (3) recommendation engine. The ontology library provides a list of ontologies as targets for building mappings. The mapping interface consists of six modules: project management, mapping dashboard, access control, logs and comments, hierarchical visualization, and result review and export. The built-in recommendation engine automatically identifies a list of candidate concepts to facilitate the mapping process. Results We report the architecture design and interface features of IMI. To validate our approach, we implemented an IMI prototype and pilot-tested features using the IMI interface to map a sample set of NAACCR data elements to NCIt concepts. 47 out of 301 NAACCR data elements have been mapped to NCIt concepts. Five branches of hierarchical tree have been identified from these mapped concepts for visual inspection. Conclusions IMI provides an interactive, web-based interface for building mappings from data dictionaries to ontologies. Although our pilot-testing scope is limited, our results demonstrate feasibility using IMI for semantic enrichment of cancer registry data by mapping NAACCR data elements to NCIt concepts.

Author(s):  
Iris Zachary ◽  
Suzanne A Boren ◽  
Eduardo Simoes ◽  
Jeannette Jackson-Thompson ◽  
J. Wade Davis ◽  
...  

Cancer registry data collection involves, at a minimum, collecting data on demographics, tumor characteristics, and treatment. A common, identified, and standardized set of data elements is needed to share data quickly and efficiently with consumers of this data. This project highlights the fact that, there is a need to develop common data elements; Surveys were developed for central cancer registries (CCRs) and cancer researchers (CRs) at NCI-designated Cancer Centers, in order to understand data needs. Survey questions were developed based on the project focus, an evaluation of the research registries and database responses, and systematic review of the literature. Questions covered the following topics: 1) Research, 2) Data collection, 3) Database/ repository, 4) Use of data, 5) Additional data items, 6) Data requests, 7) New data fields, and 8) Cancer registry data set. A review of the surveys indicates that all cancer registries’ data are used for public health surveillance, and 96% of the registries indicate the data are also used for research. Data are available online in interactive tables from over 50% of CRs and 87% of CCRs. Some other survey responses indicate that CCR treatment data are not complete for example treatment data, however cancer researchers are interested in treatment variables from CCRs. Cancer registries have many data available for review, but need to examine what data are needed and used by different entities. Cancer Registries can further enhance usage through collaborations and partnerships to connect common interests in the data by making registries visible and accessible.Keywords: Public Health; Disease Registries; Disease Reporting


Author(s):  
Dennis O. Laryea ◽  
Fred K. Awittor

ObjectiveTo discuss the implementation of confidentiality practices at theKumasi Cancer Registry.IntroductionCancer registration involves collecting information on patientswith cancer. Population-based cancer registries in particular areuseful in estimating the disease burden and to inform the institutionof prevention and control measures. Collecting personal informationon patients with cancer requires strict adherence to principles ofconfidentiality to ensure the safety of the collected data. Failure mayhave legal and medical implications. The Kumasi Cancer Registrywas established as a population-based cancer Registry in 2012. Theregistry collects data on cases of cancer occurring among residentsof the Kumasi Metropolitan area of Ghana. Issues bordering onconfidentiality were an integral part of the establishment of theregistry. We discuss the implementation of confidentiality plansduring the four years of existence of the Kumasi Cancer Registry.MethodsThe registry has a designed abstraction form which is used to collectdata. Data sources for the Registry are all major hospitals in Kumasiproviding cancer treatment services. Data sources also include privatepathology laboratories and the Births and Deaths Registry. Trainedresearch assistants collect data from the folders of patients. This isfollowed by coding and then entering into the Canreg 5 software.Coded and entered into the Canreg5 software for management andanalysis. After data entry, the forms are filed in order of registrynumbers as generated by the canreg5 software for easy reference.ResultsConfidentiality of KsCR data is ensured through the followingmeasures. The signing of a confidentiality agreement by all registrystaff. The confidentiality agreement spells out terms for the releaseof data to third parties in particular but even staff of the variousfacilities. The agreement also spells out the consequences of a breachof any of the clauses. No direct contact is made with patients duringthe process of abstraction of data by registrars. The data abstractionforms are kept in a secured safe in the registry office. The computersthat house the registry data are password enabled and are changedon a regular basis to ensure security. The Canreg5 software usedfor electronic data management also has individual profiles withpasswords for all registrars and supervisors. The scope of accessto Canreg data is limited by the profile status of the respectivestaff members. Supervisors have full access to all data includingsummarized reports. Registrars have limited access mostly restrictedto data entry. Access to the registry office is restricted to registry staffand other personnel authorized by the Registry Manager or Director.An established Registry Advisory Board is responsible for assessingrequests and approval of data from the registry. Where files have tobe sent electronically, they are password protected and sent in severalparts in separate emails.ConclusionsDespite the potential challenges to maintaining confidentialityof data in developing outcries, evidence from four years of cancerdata management in Kumasi suggests stringent measure can ensureconfidentiality. The use of multiple measures to ensure confidentialityis essential in surveillance data management


2019 ◽  
Vol 18 (5) ◽  
pp. 5-11
Author(s):  
G. V. Petrova ◽  
O. P. Gretsova ◽  
V. V. Starinsky

The purpose of the study was to compare data on the cancer incidence rates for 2016 between the official reports on cancer statistics and federal cancer registry, collected in December 2018.Material and Methods. The study estimated the total data on 18 parameters from 35 regions of Russia, covering 66.3 million people (2016). The database of the Russian cancer registry and the database containing reports on the state cancer statistics were used. The cancer statistics/cancer registry ratio was assessed.Results. No differences in cancer incidence between the official reports on cancer statistics and cancer registry data were found. In the official reports on cancer statistics, the mortality rate, the proportion of posthumously recorded patients per 100 newly diagnosed, the proportion of deaths from diseases not related to cancer per 100 deceased patients, the cancer prevalence and the prevalence rate of unspecified malignant tumors were slightly reduced (to 10 %, 9 %, 5 %, and 4 %, respectively), and the rate of cancer detection, the proportion of histologically verified diagnoses and the proportion of cancers detected in stage III were increased (to 19 %, 10 % and 14 %, respectively) compared to those in cancer registry data.Conclusion. Improvement in the quality and completeness of information about cancer patients is associated rather with increasing the annual report length than with the need to improve the cancer registration system itself.


Blood ◽  
2005 ◽  
Vol 106 (11) ◽  
pp. 1334-1334 ◽  
Author(s):  
Matthew D. Seftel ◽  
Donna Hewitt ◽  
Hui Zhang ◽  
Donna Turner ◽  
Spencer Gibson ◽  
...  

Abstract Background: The exact incidence of chronic lymphocytic leukemia (CLL) and small lymphocytic lymphoma (SLL) is unknown. In the appropriate clinical setting, peripheral blood immunophenotyping is often sufficient for diagnosis. Cancer registries that rely only on histological or cytological reporting may inaccurately estimate the incidence of CLL/SLL. The province of Manitoba, with a population of 1.2 million people, has a centralized flow cytometry service as well as a provincial cancer registry. We thus had the opportunity to use these large databases to describe the demographic and clinical patterns of CLL/SLL. This has enabled us to test the hypothesis that registry data underestimates the incidence of this disease. Methods: All patients diagnosed with CLL/SLL between January 1, 1998 and December 31, 2003 were obtained from the Manitoba cancer registry and the central flow cytometry database. Additional clinical characteristics were obtained from a chart review. Results: 491 patients were diagnosed by flow cytometry. In contrast, cancer registry data reported 345 patients with CLL/SLL, 131 (38%) of which were diagnosed in tertiary care centres. Thus, 146 (30%) patients were not known to the provincial cancer registry. Median age of pts was 71 years (range, 24–97). Based on 2001 Canadian census data, the crude incidence of CLL/SLL in Manitoba is estimated to be 7 per 100 000 persons. Other demographic and clinical data of this population-based study will be presented. Conclusion: By incorporating diagnostic immunophenotyping, the incidence of CLL/SLL appears to be higher than that reported by a large Canadian cancer registry. This observation may apply to other local and national jurisdictions, and should be studied further.


2016 ◽  
Vol 2 (3_suppl) ◽  
pp. 43s-43s
Author(s):  
Malebogo Pusoentsi ◽  
Bame P. Shatera ◽  
Setlogelo Motlogi ◽  
Tuduetso Monagen ◽  
Neo Tapela ◽  
...  

Abstract 69 Background: One of the challenges to addressing the growing burden of cancer in low- and middle-income countries is insufficient data and limitations in quality of cancer registries. The Botswana National Cancer Registry (BNCR), first established in 1999, is an IARC-endorsed population-based registry covering a population of 2.1 million. Here we assess BNCR's data quality over time. Methods: We conducted a retrospective review of BNCR data that was collected between January 1, 2005 and December 31, 2010. We assessed basis of cancer diagnosis, as well as key data quality indices (completeness, consistency, uniqueness, and accuracy) over two time periods: 2005–2007 and 2008–2010. We assessed cancer incidence and distribution during this time period, and reviewed Botswana Ministry of Health operational documents to identify major health care initiatives that may have had a bearing on cancer registry data quality. Results: In total, 8,938 cancer cases were registered 2005-2010. Kaposi sarcoma was the most commonly diagnosed cancer (n=1766, 19.4%), followed by cervical cancer (n=1252, 13.8%) and then breast cancer (n=801, 8.8%). During 2005-2007, 79% of all cancers were morphologically verified and 6% of were verified by death certificate alone. By 2008-2010, 89% of cancers were morphologically verified while none (0%) were verified by death certificate alone. There was a marked difference for basis of Kaposi sarcoma diagnosis (26% in 2005-2007, 43.8% 2008-2010), which changed from mainly clinical to pathology-based diagnosis. Factors that have contributed to this improvement include targeted initiatives such as clinician training, as well as broader health system developments such as general laboratory diagnostic capacitation that has facilitated use of histopathology services for cancer. Conclusion: BNCR data quality has improved over the years. These improvements enhance utility of cancer registry data for healthcare planning, and highlight the merit of cross-cutting health systems strengthening developments. This assessment, and the initiatives that have contributed to BNCR data improvement may be relevant to cancer registries in similar settings. AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST: No COIs from the authors.


2007 ◽  
Vol 22 (4) ◽  
pp. 282-290 ◽  
Author(s):  
Djenaba A. Joseph ◽  
Phyllis A. Wingo ◽  
Jessica B. King ◽  
Lori A. Pollack ◽  
Lisa C. Richardson ◽  
...  

AbstractPurpose:The objective of this study was to estimate the burden of cancer in counties affected by Hurricane Katrina using population-based cancer registry data, and to discuss issues related to cancer patients who have been displaced by disasters.Methods:The cancer burden was assessed in 75 counties in Louisiana, Alabama, and Mississippi that were designated by the Federal Emergency Management Agency as eligible for individual and public assistance. Data from the National Program of Cancer Registries were used to determine three-year average annual age-adjusted incidence rates and case counts during the diagnosis years 2000–2002 for Louisiana and Alabama. Expected rates and counts for the most-affected counties in Mississippi were estimated by direct, age-specific calculation using the 2000–2002 county level populations and the site-, sex-, race-, and age-specific cancer incidence rates for Louisiana.Results:An estimated 23,549 persons with a new diagnosis of cancer in the past year resided in the disaster-affected counties. Fifty-eight percent of the cases were cancers of the lung/bronchus, colon/rectum, female breast, and prostate. Eleven of the top 15 cancer sites by sex and black/white race in disaster counties had >50% of cases diagnosed at the regional or distant stage.Conclusions:Sizable populations of persons with a recent cancer diagnosis were potentially displaced by Hurricane Katrina. Cancer patients required special attention to access records in order to confirm diagnosisand staging, minimize disruption in treatment, and ensure coverage of care. Cancer registry data can be used to provide disaster planners and clinicians with estimates of the number of cancer patients, many of whom maybe undergoing active treatment.


2005 ◽  
Vol 12 (1) ◽  
pp. 43-49 ◽  
Author(s):  
Sven Törnberg ◽  
Mary Codd ◽  
Vitor Rodrigues ◽  
Nereo Segnan ◽  
Antonio Ponti

Objectives: The purpose of the present study was to estimate the interval cancer (IC) rates in four population-based mammography screening programmes in four countries with different health-care environments, different access to cancer registry data, and different age groupsof women invited. Setting: The screening programmes in Coimbra (Portugal), Dublin (Ireland), Stockholm (Sweden), and Turin (Italy) participated in the study. Methods: All cancer cases were searched for in cancer registries. IC rates and other outcome measures from the screeningprogrammes were estimated and compared between the centres. Poisson regression model was used to estimate the proportional incidence based on IC rate in relation to expected total breast cancer incidence rate in the absence of screening. Results: There was a more than tenfold difference inthe number of invited women at the first round between the involved centres. The IC rates varied between 4.3 and 23.8 per 10,000 women screened. The levels of IC rates in relation to the estimated background incidence varied from 0.35 up to 0.46 depending on age groups involved in the programme,but did not differ significantly between three of the four involved centres. Conclusions: IC rates were quite similar between three of the four centres despite the differences in target population, invited ages, length of building-up of the programmes and different health-care organizations.Different access to complete cancer registry data is likely to explain the lower IC rates in the fourth centre.


2012 ◽  
Author(s):  
◽  
Iris Zachary

Cancer registries in the US and Canada have a long history of data standards and data collection that have developed from a minimal dataset to the standard dataset that is used now. Central Cancer Registries (CCRs) are good resources for cancer data, but are often underutilized. CCRs are recognized for high quality data standards by the Centers for Disease Control and Prevention (CDC) National Program of Cancer Registries (NPCR) or the National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program and receive certification from the North American Association of Central Registries (NAACCR). Each year, there are many changes to the data that are collected in the cancer registry field. Standards, requirements, and medical knowledge change frequently. The changes in the data collection process cause interference and decrease in quality of data fields, but also delays in the timely collection of cancer registry data. The objective of this study is to identify what essentially needs to be collected and what can be collected optionally in a cancer registry. The goal is a robust dataset that can be used for other disease registries, cancer data surveillance, public health, and research. CCRs and Cancer Centers (CR) were surveyed to identify and describe the data items that are collected and needed to achieve a dataset that can serve cancer surveillance and research. The surveys were analyzed to identify overlaps of common and special interests, as well as barriers. The results showed that cancer registries have data available, but need to look at the timely release of a core dataset for use in cancer surveillance and research. The surveys also evaluated the barriers to data use from cancer registries and barriers for data use of collected datasets to identify the initial data request process. Data in the cancer registry are in a format that can easily be adopted by public health, surveillance, and research. The requesting process needs to be accessible, understandable, and streamlined to enable successful use of the data.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261416
Author(s):  
Paul P. Fahey ◽  
Andrew Page ◽  
Thomas Astell-Burt ◽  
Glenn Stone

Background As oesophageal cancer has short survival, it is likely pre-diagnosis health behaviours will have carry-over effects on post-diagnosis survival times. Cancer registry data sets do not usually contain pre-diagnosis health behaviours and so need to be augmented with data from external health surveys. A new algorithm is introduced and tested to augment cancer registries with external data when one-to-one data linkage is not available. Methods The algorithm is to use external health survey data to impute pre-diagnosis health behaviour for cancer patients, estimate misclassification errors in these imputed values and then fit misclassification corrected Cox regression to quantify the association between pre-diagnosis health behaviour and post-diagnosis survival. Data from US cancer registries and a US national health survey are used in testing the algorithm. Results It is demonstrated that the algorithm works effectively on simulated smoking data when there is no age confounding. But age confounding does exist (risk of death increases with age and most health behaviours change with age) and interferes with the performance of the algorithm. The estimate of the hazard ratio (HR) of pre-diagnosis smoking was HR = 1.32 (95% CI 0.82,2.68) with HR = 1.93 (95% CI 1.08,7.07) in the squamous cell sub-group and pre-diagnosis physical activity was protective of survival with HR = 0.25 (95% CI 0.03, 0.81). But the method failed for less common behaviours (such as heavy drinking). Conclusions Further improvements in the I2C2 algorithm will permit enrichment of cancer registry data through imputation of new variables with negligible risk to patient confidentiality, opening new research opportunities in cancer epidemiology.


2019 ◽  
Vol 82 (S 01) ◽  
pp. S62-S71 ◽  
Author(s):  
Volker Arndt ◽  
Bernd Holleczek ◽  
Hiltraud Kajüter ◽  
Sabine Luttmann ◽  
Alice Nennecke ◽  
...  

AbstractPopulation-based cancer registries have a long-standing role in cancer monitoring. Scientific use of cancer registry data is one important purpose of cancer registration, but use of cancer registry data is not restricted to cancer registries. Cancer registration in Germany is currently heading towards population-based collection of detailed clinical data. This development together with additional options for record linkage and long-term follow-up will offer new opportunities for health services and outcome research. Both regional population-based registries and the German Centre for Cancer Registry Data (ZfKD) at the Robert Koch-Institute as well as international cancer registries and consortia or organizations may provide external researchers access to individual or aggregate level data for secondary data analysis. In this review, we elaborate on the access to cancer registry data for research purposes, availability of specific data items, and options for data linkage with external data sources. We also discuss as well as on limitations in data availability and quality, and describe typical biases in design and analysis.


Sign in / Sign up

Export Citation Format

Share Document