scholarly journals Retraction Note to: Text mining and network analysis of molecular interaction in non-small cell lung cancer by using natural language processing

2015 ◽  
Vol 42 (10) ◽  
pp. 1489-1489
Author(s):  
Jun Li ◽  
Lintao Bi ◽  
Yanxia Sun ◽  
Zhenxia Lu ◽  
Yumei Lin ◽  
...  
2019 ◽  
pp. 1-15 ◽  
Author(s):  
Bernardo Haddock Lobo Goulart ◽  
Emily T. Silgard ◽  
Christina S. Baik ◽  
Aasthaa Bansal ◽  
Qin Sun ◽  
...  

PURPOSE SEER registries do not report results of epidermal growth factor receptor ( EGFR) and anaplastic lymphoma kinase ( ALK) mutation tests. To facilitate population-based research in molecularly defined subgroups of non–small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR). METHODS We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of EGFR and ALK test status (reported v not reported) and test results (positive v negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets. RESULTS NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for EGFR and ALK test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of EGFR and ALK in CSS patients—F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients. CONCLUSION NLP is an internally valid method for the ascertainment of EGFR and ALK test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.


Author(s):  
Serene Wong ◽  
Max Kotlyar ◽  
Dan Strumpf ◽  
Nick Cercone ◽  
Frances A. Shepherd ◽  
...  

Author(s):  
Kyle G. Mitchell ◽  
David B. Nelson ◽  
Erin M. Corsini ◽  
Arlene M. Correa ◽  
Jeremy J. Erasmus ◽  
...  

Objective Though interest in expansion of the use of less-invasive therapies among operable non-small-cell lung cancer (NSCLC) patients is growing, it is not clear that post-treatment surveillance has been comparable between treatment modalities. We sought to characterize institutional surveillance patterns after NSCLC therapy with stereotactic body radiation therapy (SBRT) and lobectomy. Methods NSCLC patients treated with lobectomy or SBRT (2005 to 2016) at a single institution were identified. Natural language processing searched data fields within axial surveillance imaging reports for findings suggestive of recurrence. Duration and patterns of institutional surveillance were compared between the 2 groups. Results Three thousand forty-two patients (73.5% lobectomy, 26.5% SBRT) met inclusion criteria. Patients had a longer median duration of surveillance after lobectomy (28.0 months vs SBRT 12.3 months, P < 0.001) and were more likely to undergo histopathological evaluation of clinically suspected relapse (206/274 [75.2%] vs SBRT 54/113 [47.8%], P < 0.001). Patients with clinical suspicion of recurrence had longer durations of institutional surveillance than those who did not among both cohorts (lobectomy 44.4 months vs 25.9, P < 0.001; SBRT 27.9 vs 10.3, P < 0.001). Landmark analyses at 1 and 3 years after therapy identified associations between receipt of lobectomy and ongoing surveillance at each time point (1 year odds ratio [OR] 2.10, P < 0.001; 3 years OR 1.71, P < 0.001) among all patients and those with documented stage I disease. Conclusions We identified potential heterogeneity in institutional surveillance patterns after treatment of NSCLC with 2 therapeutic modalities. As less-invasive treatment options for operable patients expand, it will be critical to implement rigorous surveillance paradigms across all modalities.


PLoS ONE ◽  
2019 ◽  
Vol 14 (2) ◽  
pp. e0212454 ◽  
Author(s):  
Frances B. Maguire ◽  
Cyllene R. Morris ◽  
Arti Parikh-Patel ◽  
Rosemary D. Cress ◽  
Theresa H. M. Keegan ◽  
...  

2020 ◽  
Vol 21 (22) ◽  
pp. 8730
Author(s):  
Pamela Vernocchi ◽  
Tommaso Gili ◽  
Federica Conte ◽  
Federica Del Chierico ◽  
Giorgia Conta ◽  
...  

Several studies in recent times have linked gut microbiome (GM) diversity to the pathogenesis of cancer and its role in disease progression through immune response, inflammation and metabolism modulation. This study focused on the use of network analysis and weighted gene co-expression network analysis (WGCNA) to identify the biological interaction between the gut ecosystem and its metabolites that could impact the immunotherapy response in non-small cell lung cancer (NSCLC) patients undergoing second-line treatment with anti-PD1. Metabolomic data were merged with operational taxonomic units (OTUs) from 16S RNA-targeted metagenomics and classified by chemometric models. The traits considered for the analyses were: (i) condition: disease or control (CTRLs), and (ii) treatment: responder (R) or non-responder (NR). Network analysis indicated that indole and its derivatives, aldehydes and alcohols could play a signaling role in GM functionality. WGCNA generated, instead, strong correlations between short-chain fatty acids (SCFAs) and a healthy GM. Furthermore, commensal bacteria such as Akkermansia muciniphila, Rikenellaceae, Bacteroides, Peptostreptococcaceae, Mogibacteriaceae and Clostridiaceae were found to be more abundant in CTRLs than in NSCLC patients. Our preliminary study demonstrates that the discovery of microbiota-linked biomarkers could provide an indication on the road towards personalized management of NSCLC patients.


2017 ◽  
Vol 13 (8) ◽  
pp. 1481-1494 ◽  
Author(s):  
Sainitin Donakonda ◽  
Swati Sinha ◽  
Shrinivas Nivrutti Dighe ◽  
Manchanahalli R Satyanarayana Rao

Systematic functional network analysis of ASCL1 revealed that it regulates mitosis and cell proliferation pathways and has distinct functions in glioma and SCLC.


2020 ◽  
Author(s):  
Hyun Ae Jung ◽  
Oksoon Jeong ◽  
Dong Kyung Chang ◽  
Sehhoon Park ◽  
Jong-Mu Sun ◽  
...  

BACKGROUND The American Society of Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often not recorded in an organized way, and converting them into a structured format can be time-consuming. The Clinical Data Warehouse is a database that consolidates data from various clinical sources. However, the clinical data extracted from this database include not only structured data but also natural language generated in clinical practice, so applying them to clinical research is difficult because they are not structured and formatted to find key-point contents. OBJECTIVE To determine how best to organize a huge amount of clinical data to evaluate the clinical features and outcomes of upper aerodigestive tract cancers, including cancers of the head and neck, esophagus, lung, and thymus and mesothelioma.The Real-time autOmatically updated data warehOuse in healThcare uses six main areas to describe the journey of cancer patients. METHODS In this study, we developed an algorithm optimized for each disease category using natural language processing of unstructured data and data capture of structured data. We used data from patients diagnosed at Samsung Medical Center from 2008 to 2020. RESULTS We collected comprehensive clinical data for 67,617 patients across 6 tumor types: 28,954 with non-small-cell lung cancer; 2,540 with small-cell lung cancer; 30,035 with head and neck cancer; 4,950 with esophageal cancer; 966 with thymic cancer; and 172 with mesothelioma. The results of a longitudinal molecular study, such as EGFR mutations, ALK tests, and NGS, were also included. Scattered information was integrated and automatically built up to match the cohort, allowing users to instantly capture most updated test results and treatment outcomes. CONCLUSIONS This is a landmark study documenting the successful construction of a real-time updating system for big medical data based on the CDW program.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Martijn G. Kersloot ◽  
Francis Lau ◽  
Ameen Abu-Hanna ◽  
Derk L. Arts ◽  
Ronald Cornet

Abstract Background Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them. Methods An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F1-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test. Results DIRECT detected lung cancer and non-small cell lung cancer concepts with F1-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F1-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F1-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively. Conclusion DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F1-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.


Sign in / Sign up

Export Citation Format

Share Document