scholarly journals Machine Learning to Improve Treatment Selection for NSCLC Patients Treated with Immunotherapy using Real World and Translational Data

Author(s):  
Arsela Prelaj ◽  
Mattia Boeri ◽  
Alessandro Robuschi ◽  
Roberto Ferrara ◽  
Claudia Proto ◽  
...  

Abstract Introduction: In advanced Non-Small Cell Lung Cancer (NSCLC), Programmed Death Ligand 1 (PD-L1) remains the only used biomarker to candidate patients to immunotherapy (IO) with many limits. Given the complex dynamics of the immune system it is improbable that a single biomarker could be able to profile prediction with high accuracy. A promising solution cope with this complexity is provided by Artificial Intelligence (AI) and Machine Learning (ML), which are techniques able to analyse and interpret big multifactorial data. The present study aims at using AI tools to improve response and efficacy prediction in NSCLC patients treated with IO.MethodsReal world data (clinical data, PD-L1, histology, molecular, lab tests) and the blood microRNA signature classifier (MSC), which include 24 different microRNAs, were used. Patients were divided into responders (R), who obtained a complete or partial response or stable disease as best response, and non-responders (NR), who experienced progressive or hyperprogressive disease and those who died before the first radiologic evaluation. Moreover, we used the same data to determine if the overall survival of the patients was likely to be shorter or longer than 24 months from baseline IO. For A literature review and forward feature selection technique was used to extract a specific subset of the patients’ data. To develop the final predictive model, different ML methods have been tested, i.e., Feedforward Neural Network (FFNN), Logistic Regression (LR), K-nearest neighbours (K-NN), Support Vector Machines (SVM), and Random Forest (RF).Results 200 patients were included. 164 out of 200 (i.e., only those patients with PD-L1 data available) were considered in the model, 73 (44.5%) were R and 91 (55.5%) NR. Overall, the best model was the LR and included 5 features: 2 clinical features including the ECOG performance status and IO-line of therapy; 1 tissue feature such as PD-L1 tumour expression; and 2 blood features including the MSC test and the neutrophil-to-lymphocyte ratio (NLR). The model predicting R/NR of the patient achieves accuracy ACC= 0.756, F1 score F1=0.722, and Area Under the ROC Curve AUC=0.82. The use of the PD-L1 alone has an ACC=0.655. The accuracy of the ML models excluding some of the features from the model were as follow: without PD-L1 value (ACC=0.726), MSC (ACC=0.750), and both PD-L1 and MSC (ACC=0.707), i.e., considering only clinical features. At data cut-off (Nov 2020), median Overall Survival (mOS) for R was 38.5 months (m) (95%IC 23.9 - 53.1) vs 3.8 m (95%IC 2.8 - 4.7) for NR, with p<0.001. LR was the most performing model in predicting patients with long survival (24-months OS), achieving ACC=0.839, F1=0.908, and AUC=0.87.ConclusionsThe results suggest that the integration of multifactorial data provided by ML techniques is a useful tool to improve personalized selection of NSCLC patients candidates to IO. In particular, compared to PD-L1 alone the expected improvement was around 10%. In particular, the model shows that the higher the ECOG, NLR value, IO-line, and MSC test level the lower the response, and the higher PD-L1 the higher the response. Considering the difference in survival among R and NR groups, these results suggest that the model can also be used to indirectly predict survival. Moreover, a second model was able to predict long survival patients with good accuracy.

2021 ◽  
Author(s):  
Arsela Prelaj ◽  
Mattia Boeri ◽  
Alessandro Robuschi ◽  
Roberto Ferrara ◽  
Claudia Proto ◽  
...  

Abstract BackgroundIn advanced Non-Small Cell Lung Cancer (NSCLC), Programmed Death Ligand 1 (PD-L1) remains the only used biomarker to candidate patients to immunotherapy (IO) with many limits. Given the complex dynamics of the immune system it is improbable that a single biomarker could be able to profile prediction with high accuracy. A promising solution cope with this complexity is provided by Artificial Intelligence (AI) and Machine Learning (ML), which are techniques able to analyse and interpret big multifactorial data. The present study aims at using AI tools to improve response and efficacy prediction in NSCLC patients treated with IO. MethodsReal world data (clinical data, PD-L1, histology, molecular, lab tests) and the blood microRNA signature classifier (MSC), which include 24 different microRNAs, were used. Patients were divided into responders (R), who obtained a complete or partial response or stable disease as best response, and non-responders (NR), who experienced progressive or hyperprogressive disease and those who died before the first radiologic evaluation. Moreover, we used the same data to determine if the overall survival of the patients was likely to be shorter or longer than 24 months from baseline IO. For A literature review and forward feature selection technique was used to extract a specific subset of the patients data. To develop the final predictive model, different ML methods have been tested, i.e., Feedforward Neural Network (FFNN), Logistic Regression (LR), K-nearest neighbors (K-NN), Support Vector Machines (SVM), and Random Forest (RF).Results 200 patients were included. 164 out of 200 (i.e., only those patients with PD-L1 data available) were considered in the model, 73 (44.5%) were R and 91 (55.5%) NR. Overall, the best model was the LR and included 5 features: 2 clinical features including the ECOG performance status and IO-line of therapy; 1 tissue feature such as PD-L1 tumour expression; and 2 blood features including the MSC test and the neutrophil-to-lymphocyte ratio (NLR). The model predicting R/NR of the patient achieves accuracy ACC= 0.756, F1 score F1=0.722, and Area Under the ROC Curve AUC=0.82. The use of the PD-L1 alone has an ACC=0.655. The accuracy of the ML models excluding some of the features from the model were as follow: without PD-L1 value (ACC=0.726), MSC (ACC=0.750), and both PD-L1 and MSC (ACC=0.707), i.e., considering only clinical features. At data cut-off (Nov 2020), median Overall Survival (mOS) for R was 38.5 months (m) (95%IC 23.9 - 53.1) vs 3.8 m (95%IC 2.8 - 4.7) for NR, with p<0.001. LR was the most performing model in predicting patients with long survival (24-months OS), achieving ACC=0.839, F1=0.908, and AUC=0.87. ConclusionsThe results suggest that the integration of multifactorial data provided by ML techniques is a useful tool to improve personalized selection of NSCLC patients candidates to IO. In particular, compare to PD-L1 alone the expected improvement was around 10%. In particular, the model shows that the higher the ECOG, NLR value, IO-line, and MSC test level the lower the response, and the higher PD-L1 the higher the response. Considering the difference in survival among R and NR groups, these results suggest that the model can also be used to indirectly predict survival. Moreover, a second model was able to predict long survival patients with good accuracy.


Cancers ◽  
2022 ◽  
Vol 14 (2) ◽  
pp. 435
Author(s):  
Arsela Prelaj ◽  
Mattia Boeri ◽  
Alessandro Robuschi ◽  
Roberto Ferrara ◽  
Claudia Proto ◽  
...  

(1) Background: In advanced non-small cell lung cancer (aNSCLC), programmed death ligand 1 (PD-L1) remains the only biomarker for candidate patients to immunotherapy (IO). This study aimed at using artificial intelligence (AI) and machine learning (ML) tools to improve response and efficacy predictions in aNSCLC patients treated with IO. (2) Methods: Real world data and the blood microRNA signature classifier (MSC) were used. Patients were divided into responders (R) and non-responders (NR) to determine if the overall survival of the patients was likely to be shorter or longer than 24 months from baseline IO. (3) Results: One-hundred sixty-four out of 200 patients (i.e., only those ones with PD-L1 data available) were considered in the model, 73 (44.5%) were R and 91 (55.5%) NR. Overall, the best model was the linear regression (RL) and included 5 features. The model predicting R/NR of patients achieved accuracy ACC = 0.756, F1 score F1 = 0.722, and area under the ROC curve AUC = 0.82. LR was also the best-performing model in predicting patients with long survival (24 months OS), achieving ACC = 0.839, F1 = 0.908, and AUC = 0.87. (4) Conclusions: The results suggest that the integration of multifactorial data provided by ML techniques is a useful tool to select NSCLC patients as candidates for IO.


2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


Author(s):  
Young Jae Kim

The diagnosis of sarcopenia requires accurate muscle quantification. As an alternative to manual muscle mass measurement through computed tomography (CT), artificial intelligence can be leveraged for the automation of these measurements. Although generally difficult to identify with the naked eye, the radiomic features in CT images are informative. In this study, the radiomic features were extracted from L3 CT images of the entire muscle area and partial areas of the erector spinae collected from non-small cell lung carcinoma (NSCLC) patients. The first-order statistics and gray-level co-occurrence, gray-level size zone, gray-level run length, neighboring gray-tone difference, and gray-level dependence matrices were the radiomic features analyzed. The identification performances of the following machine learning models were evaluated: logistic regression, support vector machine (SVM), random forest, and extreme gradient boosting (XGB). Sex, coarseness, skewness, and cluster prominence were selected as the relevant features effectively identifying sarcopenia. The XGB model demonstrated the best performance for the entire muscle, whereas the SVM was the worst-performing model. Overall, the models demonstrated improved performance for the entire muscle compared to the erector spinae. Although further validation is required, the radiomic features presented here could become reliable indicators for quantifying the phenomena observed in the muscles of NSCLC patients, thus facilitating the diagnosis of sarcopenia.


2018 ◽  
Vol 13 (10) ◽  
pp. S459-S460 ◽  
Author(s):  
F. Barlesi ◽  
L. Paz-Ares ◽  
D. Page ◽  
A. Shewade ◽  
P. Lambert ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6479
Author(s):  
Luca Palmerini ◽  
Jochen Klenk ◽  
Clemens Becker ◽  
Lorenzo Chiari

Falling is a significant health problem. Fall detection, to alert for medical attention, has been gaining increasing attention. Still, most of the existing studies use falls simulated in a laboratory environment to test the obtained performance. We analyzed the acceleration signals recorded by an inertial sensor on the lower back during 143 real-world falls (the most extensive collection to date) from the FARSEEING repository. Such data were obtained from continuous real-world monitoring of subjects with a moderate-to-high risk of falling. We designed and tested fall detection algorithms using features inspired by a multiphase fall model and a machine learning approach. The obtained results suggest that algorithms can learn effectively from features extracted from a multiphase fall model, consistently overperforming more conventional features. The most promising method (support vector machines and features from the multiphase fall model) obtained a sensitivity higher than 80%, a false alarm rate per hour of 0.56, and an F-measure of 64.6%. The reported results and methodologies represent an advancement of knowledge on real-world fall detection and suggest useful metrics for characterizing fall detection systems for real-world use.


Sensors ◽  
2019 ◽  
Vol 19 (10) ◽  
pp. 2266 ◽  
Author(s):  
Nikolaos Sideris ◽  
Georgios Bardis ◽  
Athanasios Voulodimos ◽  
Georgios Miaoulis ◽  
Djamchid Ghazanfarpour

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e19324-e19324
Author(s):  
Wanyuan Cui ◽  
Fanny Franchini ◽  
Marliese Alexander ◽  
Ann Officer ◽  
Hui Li Wong ◽  
...  

e19324 Background: KRASG12C mutations are present in 15% of non-small cell lung cancer (NSCLC) and have recently been shown to confer sensitivity to KRAS(G12C) inhibitors. This study aims to assess the clinical features and outcomes with KRASG12C mutant NSCLC in a real-world setting. Methods: Patients enrolled in an Australian prospective cohort study, Thoracic Malignancies Cohort (TMC), between July 2012 to October 2019 with metastatic or recurrent non-squamous NSCLC, with available KRAS test results, and without EGFR, ALK, or ROS1 gene aberrations, were selected. Data was extracted from TMC and patient records. Clinicopathologic features, treatment and overall survival was compared for KRAS wildtype ( KRASWT) and KRAS mutated ( KRASmut) patients, and between KRAS G12C ( KRASG12C) and other ( KRASother) mutations. Results: Of 1386 patients with non squamous NSCLC, 1040 were excluded for: non metastatic or recurrent (526); KRAS not tested (356); ALK, EGFR or ROS1 positive (154); duplicate (4). Of 346 patients analysed, 202 (58%) were KRASWT and 144 (42%) were KRASmut, of whom 65 (45%) were KRASG12C. 100% of pts with KRASG12C were smokers, compared to 92% of KRASother and 83% of KRASWT. The prevalence of brain metastases over entire follow-up period was similar between KRASmut and KRASWT (33% vs 40%, p = 0.17), and KRASG12C and KRASother (40% vs 41%, p = 0.74). Likewise, there was no difference in the proportion of patients receiving one or multiple lines of systemic therapy. Overall survival (OS) was also similar between KRASmut and KRASWT (p = 0.54), and KRASG12C and KRASother (p = 0.39). Conclusions: In this real-world prospective cohort, patients had comparable clinical features regardless of having a KRASmut, KRASG12C or KRASother mutation, or being KRASWT . Treatment and survival were also similar between groups. While not prognostic, KRASG12C may be an important predictive biomarker as promising KRAS G12C covalent inhibitors continue to be developed.


10.2196/16042 ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. e16042
Author(s):  
Emily R Pfaff ◽  
Miles Crosskey ◽  
Kenneth Morton ◽  
Ashok Krishnamurthy

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.


Sign in / Sign up

Export Citation Format

Share Document