Extracting non-small cell lung cancer (NSCLC) diagnosis and diagnosis dates from electronic health record (EHR) text using a deep learning algorithm.

2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 1556-1556
Author(s):  
Alexander S. Rich ◽  
Barry Leybovich ◽  
Melissa Estevez ◽  
Jamie Irvine ◽  
Nisha Singh ◽  
...  

1556 Background: Identifying patients with a particular cancer and determining the date of that diagnosis from EHR data is important for selecting real world research cohorts and conducting downstream analyses. However, cancer diagnoses and their dates are often not accurately recorded in the EHR in a structured form. We developed a unified deep learning model for identifying patients with NSCLC and their initial and advanced diagnosis date(s). Methods: The study used a cohort of 52,834 patients with lung cancer ICD codes from the nationwide deidentified Flatiron Health EHR-derived database. For all patients in the cohort, abstractors used an in-house technology-enabled platform to identify an NSCLC diagnosis, advanced disease, and relevant diagnosis date(s) via chart review. Advanced NSCLC was defined as stage IIIB or IV disease at diagnosis or early stage disease that recurred or progressed. The deep learning model was trained on 38,517 patients, with a separate 14,317 patient test cohort. The model input was a set of sentences containing keywords related to (a)NSCLC, extracted from a patient’s EHR documents. Each sentence was associated with a date, using the document timestamp or, if present, a date mentioned explicitly in the sentence. The sentences were processed by a GRU network, followed by an attentional network that integrated across sentences, outputting a prediction of whether the patient had been diagnosed with (a)NSCLC and the diagnosis date(s) if so. We measured sensitivity and positive predictive value (PPV) of extracting the presence of initial and advanced diagnoses in the test cohort. Among patients with both model-extracted and abstracted diagnosis dates, we also measured 30-day accuracy, defined as the proportion of patients where the dates match to within 30 days. Real world overall survival (rwOS) for patients abstracted vs. model-extracted as advanced was calculated using Kaplan-Meier methods (index date: abstracted vs. model-extracted advanced diagnosis date). Results: Results in the Table show the sensitivity, PPV, and accuracy of the model extracted diagnoses and dates. RwOS was similar using model extracted aNSCLC diagnosis dates (median = 13.7) versus abstracted diagnosis dates (median = 13.3), with a difference of 0.4 months (95% CI = [0.0, 0.8]). Conclusions: Initial and advanced diagnosis of NSCLC and dates of diagnosis can be accurately extracted from unstructured clinical text using a deep learning algorithm. This can further enable the use of EHR data for research on real-world treatment patterns and outcomes analysis, and other applications such as clinical trials matching. Future work should aim to understand the impact of model errors on downstream analyses.[Table: see text]

2021 ◽  
Author(s):  
Jae-Seung Yun ◽  
Jaesik Kim ◽  
Sang-Hyuk Jung ◽  
Seon-Ah Cha ◽  
Seung-Hyun Ko ◽  
...  

Objective: We aimed to develop and evaluate a non-invasive deep learning algorithm for screening type 2 diabetes in UK Biobank participants using retinal images. Research Design and Methods: The deep learning model for prediction of type 2 diabetes was trained on retinal images from 50,077 UK Biobank participants and tested on 12,185 participants. We evaluated its performance in terms of predicting traditional risk factors (TRFs) and genetic risk for diabetes. Next, we compared the performance of three models in predicting type 2 diabetes using 1) an image-only deep learning algorithm, 2) TRFs, 3) the combination of the algorithm and TRFs. Assessing net reclassification improvement (NRI) allowed quantification of the improvement afforded by adding the algorithm to the TRF model. Results: When predicting TRFs with the deep learning algorithm, the areas under the curve (AUCs) obtained with the validation set for age, sex, and HbA1c status were 0.931 (0.928-0.934), 0.933 (0.929-0.936), and 0.734 (0.715-0.752), respectively. When predicting type 2 diabetes, the AUC of the composite logistic model using non-invasive TRFs was 0.810 (0.790-0.830), and that for the deep learning model using only fundus images was 0.731 (0.707-0.756). Upon addition of TRFs to the deep learning algorithm, discriminative performance was improved to 0.844 (0.826-0.861). The addition of the algorithm to the TRFs model improved risk stratification with an overall NRI of 50.8%. Conclusions: Our results demonstrate that this deep learning algorithm can be a useful tool for stratifying individuals at high risk of type 2 diabetes in the general population.


10.2196/15931 ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. e15931 ◽  
Author(s):  
Chin-Sheng Lin ◽  
Chin Lin ◽  
Wen-Hui Fang ◽  
Chia-Jung Hsu ◽  
Sy-Jou Chen ◽  
...  

Background The detection of dyskalemias—hypokalemia and hyperkalemia—currently depends on laboratory tests. Since cardiac tissue is very sensitive to dyskalemia, electrocardiography (ECG) may be able to uncover clinically important dyskalemias before laboratory results. Objective Our study aimed to develop a deep-learning model, ECG12Net, to detect dyskalemias based on ECG presentations and to evaluate the logic and performance of this model. Methods Spanning from May 2011 to December 2016, 66,321 ECG records with corresponding serum potassium (K+) concentrations were obtained from 40,180 patients admitted to the emergency department. ECG12Net is an 82-layer convolutional neural network that estimates serum K+ concentration. Six clinicians—three emergency physicians and three cardiologists—participated in human-machine competition. Sensitivity, specificity, and balance accuracy were used to evaluate the performance of ECG12Net with that of these physicians. Results In a human-machine competition including 300 ECGs of different serum K+ concentrations, the area under the curve for detecting hypokalemia and hyperkalemia with ECG12Net was 0.926 and 0.958, respectively, which was significantly better than that of our best clinicians. Moreover, in detecting hypokalemia and hyperkalemia, the sensitivities were 96.7% and 83.3%, respectively, and the specificities were 93.3% and 97.8%, respectively. In a test set including 13,222 ECGs, ECG12Net had a similar performance in terms of sensitivity for severe hypokalemia (95.6%) and severe hyperkalemia (84.5%), with a mean absolute error of 0.531. The specificities for detecting hypokalemia and hyperkalemia were 81.6% and 96.0%, respectively. Conclusions A deep-learning model based on a 12-lead ECG may help physicians promptly recognize severe dyskalemias and thereby potentially reduce cardiac events.


2021 ◽  
Vol 251 ◽  
pp. 04012
Author(s):  
Simon Akar ◽  
Gowtham Atluri ◽  
Thomas Boettcher ◽  
Michael Peters ◽  
Henry Schreiner ◽  
...  

The locations of proton-proton collision points in LHC experiments are called primary vertices (PVs). Preliminary results of a hybrid deep learning algorithm for identifying and locating these, targeting the Run 3 incarnation of LHCb, have been described at conferences in 2019 and 2020. In the past year we have made significant progress in a variety of related areas. Using two newer Kernel Density Estimators (KDEs) as input feature sets improves the fidelity of the models, as does using full LHCb simulation rather than the “toy Monte Carlo” originally (and still) used to develop models. We have also built a deep learning model to calculate the KDEs from track information. Connecting a tracks-to-KDE model to a KDE-to-hists model used to find PVs provides a proof-of-concept that a single deep learning model can use track information to find PVs with high efficiency and high fidelity. We have studied a variety of models systematically to understand how variations in their architectures affect performance. While the studies reported here are specific to the LHCb geometry and operating conditions, the results suggest that the same approach could be used by the ATLAS and CMS experiments.


2021 ◽  
pp. svn-2020-000647
Author(s):  
Jia-wei Zhong ◽  
Yu-jia Jin ◽  
Zai-jun Song ◽  
Bo Lin ◽  
Xiao-hui Lu ◽  
...  

Background and purposeEarly haematoma expansion is determinative in predicting outcome of intracerebral haemorrhage (ICH) patients. The aims of this study are to develop a novel prediction model for haematoma expansion by applying deep learning model and validate its prediction accuracy.MethodsData of this study were obtained from a prospectively enrolled cohort of patients with primary supratentorial ICH from our centre. We developed a deep learning model to predict haematoma expansion and compared its performance with conventional non-contrast CT (NCCT) markers. To evaluate the predictability of this model, it was also compared with a logistic regression model based on haematoma volume or the BAT score.ResultsA total of 266 patients were finally included for analysis, and 74 (27.8%) of them experienced early haematoma expansion. The deep learning model exhibited highest C statistic as 0.80, compared with 0.64, 0.65, 0.51, 0.58 and 0.55 for hypodensities, black hole sign, blend sign, fluid level and irregular shape, respectively. While the C statistics for swirl sign (0.70; p=0.211) and heterogenous density (0.70; p=0.141) were not significantly higher than that of the deep learning model. Moreover, the predictive value for the deep learning model was significantly superior to that of the logistic model of haematoma volume (0.62; p=0.042) and the BAT score (0.65; p=0.042).ConclusionsCompared with the conventional NCCT markers and BAT predictive model, the deep learning algorithm showed superiority for predicting early haematoma expansion in ICH patients.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 8536-8536
Author(s):  
Gouji Toyokawa ◽  
Fahdi Kanavati ◽  
Seiya Momosaki ◽  
Kengo Tateishi ◽  
Hiroaki Takeoka ◽  
...  

8536 Background: Lung cancer is the leading cause of cancer-related death in many countries, and its prognosis remains unsatisfactory. Since treatment approaches differ substantially based on the subtype, such as adenocarcinoma (ADC), squamous cell carcinoma (SCC) and small cell lung cancer (SCLC), an accurate histopathological diagnosis is of great importance. However, if the specimen is solely composed of poorly differentiated cancer cells, distinguishing between histological subtypes can be difficult. The present study developed a deep learning model to classify lung cancer subtypes from whole slide images (WSIs) of transbronchial lung biopsy (TBLB) specimens, in particular with the aim of using this model to evaluate a challenging test set of indeterminate cases. Methods: Our deep learning model consisted of two separately trained components: a convolutional neural network tile classifier and a recurrent neural network tile aggregator for the WSI diagnosis. We used a training set consisting of 638 WSIs of TBLB specimens to train a deep learning model to classify lung cancer subtypes (ADC, SCC and SCLC) and non-neoplastic lesions. The training set consisted of 593 WSIs for which the diagnosis had been determined by pathologists based on the visual inspection of Hematoxylin-Eosin (HE) slides and of 45 WSIs of indeterminate cases (64 ADCs and 19 SCCs). We then evaluated the models using five independent test sets. For each test set, we computed the receiver operator curve (ROC) area under the curve (AUC). Results: We applied the model to an indeterminate test set of WSIs obtained from TBLB specimens that pathologists had not been able to conclusively diagnose by examining the HE-stained specimens alone. Overall, the model achieved ROC AUCs of 0.993 (confidence interval [CI] 0.971-1.0) and 0.996 (0.981-1.0) for ADC and SCC, respectively. We further evaluated the model using five independent test sets consisting of both TBLB and surgically resected lung specimens (combined total of 2490 WSIs) and obtained highly promising results with ROC AUCs ranging from 0.94 to 0.99. Conclusions: In this study, we demonstrated that a deep learning model could be trained to predict lung cancer subtypes in indeterminate TBLB specimens. The extremely promising results obtained show that if deployed in clinical practice, a deep learning model that is capable of aiding pathologists in diagnosing indeterminate cases would be extremely beneficial as it would allow a diagnosis to be obtained sooner and reduce costs that would result from further investigations.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Qian Huang ◽  
Xue Wen Li

Big data is a massive and diverse form of unstructured data, which needs proper analysis and management. It is another great technological revolution after the Internet, the Internet of Things, and cloud computing. This paper firstly studies the related concepts and basic theories as the origin of research. Secondly, it analyzes in depth the problems and challenges faced by Chinese government management under the impact of big data. Again, we explore the opportunities that big data brings to government management in terms of management efficiency, administrative capacity, and public services and believe that governments should seize opportunities to make changes. Brainlike computing attempts to simulate the structure and information processing process of biological neural network. This paper firstly analyzes the development status of e-government at home and abroad, studies the service-oriented architecture (SOA) and web services technology, deeply studies the e-government and SOA theory, and discusses this based on the development status of e-government in a certain region. Then, the deep learning algorithm is used to construct the monitoring platform to monitor the government behavior in real time, and the deep learning algorithm is used to conduct in-depth mining to analyze the government's intention behavior.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xiaoting Yin ◽  
Xiaosha Tao

Online business has grown exponentially during the last decade, and the industries are focusing on online business more than before. However, just setting up an online store and starting selling might not work. Different machine learning and data mining techniques are needed to know the users’ preferences and know what would be best for business. According to the decision-making needs of online product sales, combined with the influencing factors of online product sales in various industries and the advantages of deep learning algorithm, this paper constructs a sales prediction model suitable for online products and focuses on evaluating the adaptability of the model in different types of online products. In the research process, the full connection model is compared with the training results of CNN, which proves the accuracy and generalization ability of CNN model. By selecting the non-deep learning model as the comparison baseline, the performance advantages of CNN model under different categories of products are proved. In addition, the experiment concludes that the unsupervised pretrained CNN model is more effective and adaptable in sales forecasting.


Cancers ◽  
2021 ◽  
Vol 13 (18) ◽  
pp. 4585
Author(s):  
Wouter R. P. H. van de Worp ◽  
Brent van der Heyden ◽  
Georgios Lappas ◽  
Ardy van Helvoort ◽  
Jan Theys ◽  
...  

Lung cancer is the leading cause of cancer related deaths worldwide. The development of orthotopic mouse models of lung cancer, which recapitulates the disease more realistically compared to the widely used subcutaneous tumor models, is expected to critically aid the development of novel therapies to battle lung cancer or related comorbidities such as cachexia. However, follow-up of tumor take, tumor growth and detection of therapeutic effects is difficult, time consuming and requires a vast number of animals in orthotopic models. Here, we describe a solution for the fully automatic segmentation and quantification of orthotopic lung tumor volume and mass in whole-body mouse computed tomography (CT) scans. The goal is to drastically enhance the efficiency of the research process by replacing time-consuming manual procedures with fast, automated ones. A deep learning algorithm was trained on 60 unique manually delineated lung tumors and evaluated by four-fold cross validation. Quantitative performance metrics demonstrated high accuracy and robustness of the deep learning algorithm for automated tumor volume analyses (mean dice similarity coefficient of 0.80), and superior processing time (69 times faster) compared to manual segmentation. Moreover, manual delineations of the tumor volume by three independent annotators was sensitive to bias in human interpretation while the algorithm was less vulnerable to bias. In addition, we showed that besides longitudinal quantification of tumor development, the deep learning algorithm can also be used in parallel with the previously published method for muscle mass quantification and to optimize the experimental design reducing the number of animals needed in preclinical studies. In conclusion, we implemented a method for fast and highly accurate tumor quantification with minimal operator involvement in data analysis. This deep learning algorithm provides a helpful tool for the noninvasive detection and analysis of tumor take, tumor growth and therapeutic effects in mouse orthotopic lung cancer models.


2021 ◽  
Vol 32 ◽  
pp. S926-S927
Author(s):  
G. Toyokawa ◽  
Y. Yamada ◽  
N. Haratake ◽  
Y. Shiraishi ◽  
T. Takenaka ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document