scholarly journals P2.09-29 Automatic Lung Cancer Staging from Medical Reports Using Natural Language Processing

2018 ◽  
Vol 13 (10) ◽  
pp. S772
Author(s):  
X. Sui ◽  
T. Liu ◽  
Q. Huang ◽  
Y. Hou ◽  
Y. Wang ◽  
...  
2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Tommaso Lo Barco ◽  
Mathieu Kuchenbuch ◽  
Nicolas Garcelon ◽  
Antoine Neuraz ◽  
Rima Nabbout

Abstract Background The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care. Methods Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions. Results We found significative higher representation of concepts related to seizures’ phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. Conclusions Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.


2018 ◽  
pp. 1-7 ◽  
Author(s):  
Roxanne Wadia ◽  
Kathleen Akgun ◽  
Cynthia Brandt ◽  
Brenda T. Fenton ◽  
Woody Levin ◽  
...  

Purpose To compare the accuracy and reliability of a natural language processing (NLP) algorithm with manual coding by radiologists, and the combination of the two methods, for the identification of patients whose computed tomography (CT) reports raised the concern for lung cancer. Methods An NLP algorithm was developed using Clinical Text Analysis and Knowledge Extraction System (cTAKES) with the Yale cTAKES Extensions and trained to differentiate between language indicating benign lesions and lesions concerning for lung cancer. A random sample of 450 chest CT reports performed at Veterans Affairs Connecticut Healthcare System between January 2014 and July 2015 was selected. A reference standard was created by the manual review of reports to determine if the text stated that follow-up was needed for concern for cancer. The NLP algorithm was applied to all reports and compared with case identification using the manual coding by the radiologists. Results A total of 450 reports representing 428 patients were analyzed. NLP had higher sensitivity and lower specificity than manual coding (77.3% v 51.5% and 72.5% v 82.5%, respectively). NLP and manual coding had similar positive predictive values (88.4% v 88.9%), and NLP had a higher negative predictive value than manual coding (54% v 38.5%). When NLP and manual coding were combined, sensitivity increased to 92.3%, with a decrease in specificity to 62.85%. Combined NLP and manual coding had a positive predictive value of 87.0% and a negative predictive value of 75.2%. Conclusion Our NLP algorithm was more sensitive than manual coding of CT chest reports for the identification of patients who required follow-up for suspicion of lung cancer. The combination of NLP and manual coding is a sensitive way to identify patients who need further workup for lung cancer.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
J. Martijn Nobel ◽  
Sander Puts ◽  
Jakob Weiss ◽  
Hugo J. W. L. Aerts ◽  
Raymond H. Mak ◽  
...  

Abstract Background In the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context. Results Accuracy of the T-stage classifier was 0.89 in the validation set, 0.84 when considering the T-substages, and 0.76 when only considering tumor size. Results were comparable with the Dutch results (respectively, 0.88, 0.89 and 0.79). Most errors were made due to ambiguity issues that could not be solved by the rule-based nature of the algorithm. Conclusions NLP can be successfully applied for staging lung cancer from free-text radiological reports in different languages. Focused introduction of machine learning should be introduced in a hybrid approach to improve performance.


2021 ◽  
pp. 096914132110130
Author(s):  
Kim L Sandler ◽  
Diane N Haddad ◽  
Alexis B Paulson ◽  
Travis J Osterman ◽  
Carolyn C Scott ◽  
...  

Objective Lung cancer is the leading cancer killer in women, resulting in more deaths than breast, cervical and ovarian cancer combined. Screening for lung cancer has been shown to significantly reduce mortality, with some evidence that women may have a greater benefit. This study demonstrates that a population of women being screened for breast cancer may greatly benefit from screening for lung cancer. Methods Data from 18,040 women who were screened for breast cancer in 2015 at two imaging facilities that also performed lung screening were reviewed. A natural language-processing algorithm followed by a manual chart review identified women eligible for lung cancer screening by U.S. Preventive Services Task Force (USPSTF) criteria. A chart review of these eligible women was performed to determine subsequent enrollment in a lung screening program (2016–2019), current screening eligibility, cancer diagnoses and cancer-related outcomes. Results Natural language processing identified 685 women undergoing screening mammography who were also potentially eligible for lung screening based on age and smoking history. Manual chart review confirmed 251 were eligible under USPSTF criteria. By June 2019, 63 (25%) had enrolled in lung screening, of which three were diagnosed with screening-detected lung cancer resulting in zero deaths. Of 188 not screened, seven were diagnosed with lung cancer resulting in five deaths by study end. Four women received a diagnosis of breast cancer with no deaths. Conclusion Women screened for breast cancer are dying from lung cancer. We must capitalize on reducing barriers to improve screening for lung cancer among high-risk women.


2021 ◽  
Author(s):  
Tommaso Lo Barco ◽  
Mathieu Kuchenbuch ◽  
Nicolas Garcelon ◽  
Antoine Neuraz ◽  
Rima Nabbout

Abstract BackgroundThe growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare Developmental and Epileptic Encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after two years, as it is difficult to differentiate DS at onset from FS.We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of two years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care.MethodsData were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of four years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before two years and using a series of logistic regressions. ResultsWe found significative higher representation of concepts related to seizures’ phenotype distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. ConclusionsNarrative medical reports of individuals younger than two years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.


Author(s):  
Liwei Wang ◽  
Lei Luo ◽  
Yanshan Wang ◽  
Jason Wampfler ◽  
Ping Yang ◽  
...  

Abstract Background Lung cancer is the second most common cancer for men and women; the wide adoption of electronic health records (EHRs) offers a potential to accelerate cohort-related epidemiological studies using informatics approaches. Since manual extraction from large volumes of text materials is time consuming and labor intensive, some efforts have emerged to automatically extract information from text for lung cancer patients using natural language processing (NLP), an artificial intelligence technique. Methods In this study, using an existing cohort of 2311 lung cancer patients with information about stage, histology, tumor grade, and therapies (chemotherapy, radiotherapy and surgery) manually ascertained, we developed and evaluated an NLP system to extract information on these variables automatically for the same patients from clinical narratives including clinical notes, pathology reports and surgery reports. Results Evaluation showed promising results with the recalls for stage, histology, tumor grade, and therapies achieving 89, 98, 78, and 100% respectively and the precisions were 70, 88, 90, and 100% respectively. Conclusion This study demonstrated the feasibility and accuracy of automatically extracting pre-defined information from clinical narratives for lung cancer research.


Sign in / Sign up

Export Citation Format

Share Document