Clinical and genomic predictors of brain metastases (BM) in non-small cell lung cancer (NSCLC): An AACR Project GENIE analysis.

2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 2032-2032
Author(s):  
Protiva Rahman ◽  
Michele LeNoue-Newton ◽  
Sandip Chaugai ◽  
Marilyn Holt ◽  
Neha M Jain ◽  
...  

2032 Background: 30-50% of patients with non-early NSCLC will eventually develop BM, with a median survival of less than one year from BM diagnosis. There are no widely accepted clinical risk models for development of BM in patients without them at baseline. We predicted the binary risk of BM using clinical and genetic factors from a large multi-institutional cohort. Methods: Stage II-IV NSCLC patients from the AACR Project GENIE Biopharma Consortium dataset were eligible. This consisted of 4 academic institutions who curated clinical data of patients who had somatic next-generation tumor sequencing (NGS) between 2015-2017. We excluded patients who had BM at baseline, died within 30 days of NSCLC diagnosis, or did not undergo brain imaging. Covariates included demographics, anticancer therapies (received up to 90 days prior to BM development and within 5 years from NSCLC diagnosis), and NGS data; radiotherapy (RT) data were not available. NGS features included mutations and copy number alterations. These features were restricted to those classified as oncogenic by OncoKB. Univariate feature selection with Fisher’s test (p<.1) was performed on medication and genetic features. We compared 5 different machine learning models for prediction: random forest (RF), support vector machine (SVM), lasso regression, ridge regression, and an ensemble classifier. We split our data into training and test sets. 10-fold cross-validation was done on the training set for parameter tuning. The area under the receiver-operating curve (AUC) is reported on the test set. Results: 956 patients were included, 192 (20%) in the test set. Univariate features associated with BM were treatment with etoposide, Asian race, presence of bone metastases at NSCLC diagnosis, mutations in TP53 and EGFR, amplifications of ERBB2 and EGFR, and deletions of RB1, CDKN2A and CDKN2B. Univariate features inversely associated with BM were older age, treatment with nivolumab, vinorelbine, alectinib, pembrolizumab, atezolizumab, and gemcitabine, as well as mutations in NOTCH1 and KRAS. Ridge regression had the best AUC, 0.73 (Table). Conclusions: We achieved reasonable prediction performance using commonly obtained clinical and genomic information in non-early NSCLC. The biologic role of the associated alterations deserves further scrutiny; this study replicates similar findings for EGFR and KRAS in a much smaller cohort. Certain subsets of NSCLC patients may benefit from increased surveillance for BM and transition to drug therapies known to effectively cross the blood-brain barrier, e.g., nivolumab and alectinib. Inclusion of additional covariates, e.g., brain RT, may further improve model performance.[Table: see text]

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jianjiang Qi ◽  
Di He ◽  
Dagan Yang ◽  
Mengyan Wang ◽  
Wenjun Ma ◽  
...  

Abstract Background The severity of COVID-19 associates with the clinical decision making and the prognosis of COVID-19 patients, therefore, early identification of patients who are likely to develop severe or critical COVID-19 is critical in clinical practice. The aim of this study was to screen severity-associated markers and construct an assessment model for predicting the severity of COVID-19. Methods 172 confirmed COVID-19 patients were enrolled from two designated hospitals in Hangzhou, China. Ordinal logistic regression was used to screen severity-associated markers. Least Absolute Shrinkage and Selection Operator (LASSO) regression was performed for further feature selection. Assessment models were constructed using logistic regression, ridge regression, support vector machine and random forest. The area under the receiver operator characteristic curve (AUROC) was used to evaluate the performance of different models. Internal validation was performed by using bootstrap with 500 re-sampling in the training set, and external validation was performed in the validation set for the four models, respectively. Results Age, comorbidity, fever, and 18 laboratory markers were associated with the severity of COVID-19 (all P values < 0.05). By LASSO regression, eight markers were included for the assessment model construction. The ridge regression model had the best performance with AUROCs of 0.930 (95% CI, 0.914–0.943) and 0.827 (95% CI, 0.716–0.921) in the internal and external validations, respectively. A risk score, established based on the ridge regression model, had good discrimination in all patients with an AUROC of 0.897 (95% CI 0.845–0.940), and a well-fitted calibration curve. Using the optimal cutoff value of 71, the sensitivity and specificity were 87.1% and 78.1%, respectively. A web-based assessment system was developed based on the risk score. Conclusions Eight clinical markers of lactate dehydrogenase, C-reactive protein, albumin, comorbidity, electrolyte disturbance, coagulation function, eosinophil and lymphocyte counts were associated with the severity of COVID-19. An assessment model constructed with these eight markers would help the clinician to evaluate the likelihood of developing severity of COVID-19 at admission and early take measures on clinical treatment.


2021 ◽  
Author(s):  
Payton J. Jones

What differentiates a trauma from an event that is merely upsetting? Wildly different definitions of trauma have been used across various settings. Yet there is a dearth of empirical work examining the features of events that individuals use to define an event as a ‘trauma’. First, a group of qualitative coders classified features (e.g., actual physical injury, loss of possessions) of 600 event descriptions (e.g., “was verbally harassed by a boss”, “watched a video of an adult being shot and killed”). Next, across two studies, machine learning was used to predict whether individuals rated event descriptions as ‘trauma’ or ‘traumatic’ in over 100,000 judgment tasks. In Study 1, examining continuous ratings, a cross-validated LASSO regression with interaction terms provided the best out-of-sample predictions (r2 = 0.76), outperforming ridge regression, support vector regression, and linear regression. In Study 2, using binary judgments, a random forest model accurately predicted out-of-sample individual responses (AUC = 0.96), outperform-ing a neural network and an AdaBoost ensemble classifier. The most important event features across the two studies were actual death, threat of death, and the presence of a human perpetrator. The most important human features in predicting judgments were political orientation and gender.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e14138-e14138
Author(s):  
Beung-Chul AHN ◽  
Kyoung Ho Pyo ◽  
Dongmin Jung ◽  
Chun-Feng Xin ◽  
Chang Gon Kim ◽  
...  

e14138 Background: Immune checkpoint inhibitors have become breakthrough therapy for various types of cancers. However, regarding their total response rate around 20% based on clinical trials, predicting accurate aPD-1 response for individual patient is unestablished. The presence of PD-L1 expression or tumor infiltrating lymphocyte may be used as indicators of response but are limited. We developed models using machine learning methods to predict the aPD-1 response. Methods: A total of 126 advanced NSCLC patients treated with the aPD-1 were enrolled. Their clinical characteristics, treatment outcomes, and adverse events were collected. Total clinical data (n = 126) consist of 15 variables were divided into two subsets, discovery set (n = 63) and test set (n = 63). Thirteen supervised learning algorithms including support vector machine and regularized regression (lasso, ridge, elastic net) were applied on discovery set for model development and on test set for validation. Each model were evaluated according to the ROC curve and cross-validation method. Same methods were used to the subset which had additional flow cytometry data (n = 40). Results: The median age was 64 and 69.8% were male. Adenocarcinoma was predominant (69.8%) and twenty patients (15.1%) were driver mutation positive. Clinical data set (n = 126) demonstrated that the Ridge regression (AUC: 0.79) was the best model for prediction. Of 15 clinical variables, tumor burden, age, ECOG PS and PD-L1, were most important based on the random forest algorithm. When we merged the clinical and flow cytometry data, the Ridge regression model (AUC:0.82) showed better performance compared to using clinical data only. Among 52 variables of merged set, the top most important immune markers were as follows: CD3+CD8+CD25+/Teff-CD28, CD3+CD8+CD25-/Teff-Ki-67, and CD3+CD8+CD25+/Teff-NY-ESO/Teff-PD-1, which indicate activated tumor specific T cell subset. Conclusions: Our machine learning based model has benefit for predicting aPD-1 responses. After further validation in independent patient cohort, the supervised learning based non-invasive predictive score can be established to predict aPD-1 response.


2018 ◽  
Vol 7 (8) ◽  
pp. 308 ◽  
Author(s):  
Han Zheng ◽  
Zanyang Cui ◽  
Xingchen Zhang

Recognizing Modes of Driving Railway Trains (MDRT) can help to solve railway freight transportation problems in driver behavior research, auto-driving system design and capacity utilization optimization. Previous studies have focused on analyses and applications of MDRT, but there is currently no approach to automatically and effectively identify MDRT in the context of big data. In this study, we propose an integrated approach including data preprocessing, feature extraction, classifiers modeling, training and parameter tuning, and model evaluation to infer MDRT using GPS data. The highlights of this study are as follows: First, we propose methods for extracting Driving Segmented Standard Deviation Features (DSSDF) combined with classical features for the purpose of improving identification performances. Second, we find the most suitable classifier for identifying MDRT based on a comparison of performances of K-Nearest Neighbor, Support Vector Machines, AdaBoost, Random Forest, Gradient Boosting Decision Tree, and XGBoost. From the real-data experiment, we conclude that: (i) The ensemble classifier XGBoost produces the best performance with an accuracy of 92.70%; (ii) The group of DSSDF plays an important role in identifying MDRT with an accuracy improvement of 11.2% (using XGBoost). The proposed approach has been applied in capacity utilization optimization and new driver training for the Baoshen Railway.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Eunyoung Emily Lee ◽  
Woochang Hwang ◽  
Kyoung-Ho Song ◽  
Jongtak Jung ◽  
Chang Kyung Kang ◽  
...  

AbstractThe objective of the study was to develop and validate a prediction model that identifies COVID-19 patients at risk of requiring oxygen support based on five parameters: C-reactive protein (CRP), hypertension, age, and neutrophil and lymphocyte counts (CHANeL). This retrospective cohort study included 221 consecutive COVID-19 patients and the patients were randomly assigned randomly to a training set and a test set in a ratio of 1:1. Logistic regression, logistic LASSO regression, Random Forest, Support Vector Machine, and XGBoost analyses were performed based on age, hypertension status, serial CRP, and neutrophil and lymphocyte counts during the first 3 days of hospitalization. The ability of the model to predict oxygen requirement during hospitalization was tested. During hospitalization, 45 (41.8%) patients in the training set (n = 110) and 41 (36.9%) in the test set (n = 111) required supplementary oxygen support. The logistic LASSO regression model exhibited the highest AUC for the test set, with a sensitivity of 0.927 and a specificity of 0.814. An online risk calculator for oxygen requirement using CHANeL predictors was developed. “CHANeL” prediction models based on serial CRP, neutrophil, and lymphocyte counts during the first 3 days of hospitalization, along with age and hypertension status, provide a reliable estimate of the risk of supplement oxygen requirement among patients hospitalized with COVID-19.


Molecules ◽  
2020 ◽  
Vol 25 (19) ◽  
pp. 4353
Author(s):  
Yanfen Lyu ◽  
Xinqi Gong

Study of interface residue pairs is important for understanding the interactions between monomers inside a trimer protein–protein complex. We developed a two-layer support vector machine (SVM) ensemble-classifier that considers physicochemical and geometric properties of amino acids and the influence of surrounding amino acids. Different descriptors and different combinations may give different prediction results. We propose feature combination engineering based on correlation coefficients and F-values. The accuracy of our method is 65.38% in independent test set, indicating biological significance. Our predictions are consistent with the experimental results. It shows the effectiveness and reliability of our method to predict interface residue pairs of protein trimers.


2020 ◽  
Author(s):  
Jianjiang Qi ◽  
Di He ◽  
Dagan Yang ◽  
Mengyan Wang ◽  
Wenjun Ma ◽  
...  

Abstract Background: The severity of COVID-19 associates with the clinical decision making and the prognosis of COVID-19 patients, therefore, early identification of patients who are likely to develop severe or critical COVID-19 is critical in clinical practice. The aim of this study was to screen severity-associated markers and construct an assessment model for predicting the severity of COVID-19.Methods: 172 confirmed COVID-19 patients were enrolled from two designated hospitals in Hangzhou, China. Ordinal logistic regression was used to screen severity-associated markers. Least Absolute Shrinkage and Selection Operator (LASSO) regression was performed for further feature selection. Assessment models were constructed using logistic regression, ridge regression, support vector machine and random forest. The area under the receiver operator characteristic curve (AUROC) was used to evaluate the performance of different models.Results: Age, comorbidity, fever, and 18 biochemical markers (C-reactive protein, lactate dehydrogenase, D-dimer, albumin, etc) were associated with the severity of COVID-19 (all P values <0.05). By LASSO regression, eight markers were included for the assessment model construction. The ridge regression model had the best performance with AUROCs of 0.930 (95% CI, 0.914-0.943) and 0.827 (95% CI, 0.716-0.921) in the internal and external validations, respectively. A risk score, established based on the ridge regression model, had good discrimination in all patients with an AUROC of 0.897 (95% CI 0.845-0.940), and a well-fitted calibration curve. Using the optimal cutoff value of 71, the sensitivity and specificity were 87.1% and 78.1%, respectively. A web-based assessment system was developed based on the risk score. Conclusions: A panel of clinical markers were associated with the severity of COVID-19. An assessment model with eight markers would help clinicians to detect the patients who are likely to develop severe or critical COVID-19 at admission.


2020 ◽  
Vol 9 (1) ◽  
pp. 14-18
Author(s):  
Sapna Yadav ◽  
Pankaj Agarwal

Analyzing online or digital data for detecting epidemics is one of the hot areas of research and now becomes more relevant during the present outbreak of Covid-19. There are several different types of the influenza virus and moreover they keep evolving constantly in the same manner the COVID-19 virus has done. As a result, they pose a greater challenge when it comes to analyzing them, predicting when, where and at what degree of severity it will outbreak during the flu season across the world. There is need for greater surveillance to both seasonal and pandemic influenza to ensure the health and safety of the mankind. The objective of work is to apply machine learning algorithms for building predictive models that can predict where the occurrence, peak and severity of influenza in each season. For this work we have considered a freely available dataset of Ireland which is recorded for the duration of 2005 to 2016. Specifically, we have tested three ML Algorithms namely Linear Regression, Support Vector Regression and Random Forests. We found Random Forests is giving better predictive results. We also conducted experiment through weka tool and tested Zero R, Linear Regression, Lazy Kstar, Random Forest, REP Tree, Multilayer Perceptron models. We again found the Random Forest is performing better in comparison to all other models. We also evaluated other regression models including Ridge Regression, modified Ridge regression, Lasso Regression, K Neighbor Regression and evaluated the mean absolute errors. We found that modified Ridge regression is producing minimum error. The proposed work is inclined towards finding the suitability & appropriate ML algorithm for solving this problem on Flu.


2020 ◽  
Author(s):  
Jianjiang Qi ◽  
Di He ◽  
Dagan Yang ◽  
Mengyan Wang ◽  
Wenjun Ma ◽  
...  

Abstract Background: The severity of COVID-19 associates with the clinical decision making and the prognosis of COVID-19 patients, therefore, early identification of patients who are likely to develop severe or critical COVID-19 is critical in clinical practice. The aim of this study was to screen severity-associated markers and construct an assessment model for predicting the severity of COVID-19.Methods: 172 confirmed COVID-19 patients were enrolled from two designated hospitals in Hangzhou, China. Ordinal logistic regression was used to screen severity-associated markers. Least Absolute Shrinkage and Selection Operator (LASSO) regression was performed for further feature selection. Assessment models were constructed using logistic regression, ridge regression, support vector machine and random forest. The area under the receiver operator characteristic curve (AUROC) was used to evaluate the performance of different models. Internal validation was performed by using bootstrap with 500 re-sampling in the training set, and external validation was performed in the validation set for the four models, respectively.Results: Age, comorbidity, fever, and 18 laboratory markers were associated with the severity of COVID-19 (all P values <0.05). By LASSO regression, eight markers were included for the assessment model construction. The ridge regression model had the best performance with AUROCs of 0.930 (95% CI, 0.914-0.943) and 0.827 (95% CI, 0.716-0.921) in the internal and external validations, respectively. A risk score, established based on the ridge regression model, had good discrimination in all patients with an AUROC of 0.897 (95% CI 0.845-0.940), and a well-fitted calibration curve. Using the optimal cutoff value of 71, the sensitivity and specificity were 87.1% and 78.1%, respectively. A web-based assessment system was developed based on the risk score.Conclusions: Eight clinical markers of lactate dehydrogenase, C-reactive protein, albumin, comorbidity, electrolyte disturbance, coagulation function, eosinophil and lymphocyte counts were associated with the severity of COVID-19. An assessment model constructed with these eight markers would help the clinician to evaluate the likelihood of developing severity of COVID-19 at admission and early take measures on clinical treatment.


2020 ◽  
Vol 16 (8) ◽  
pp. 1088-1105
Author(s):  
Nafiseh Vahedi ◽  
Majid Mohammadhosseini ◽  
Mehdi Nekoei

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.


Sign in / Sign up

Export Citation Format

Share Document