scholarly journals Survival prediction for lung cancer patients by integrating clinical and molecular features using machine learning

Author(s):  
Rizwan Qureshi

The paper presents a model for survival prediction of lung cancer patients

2021 ◽  
Author(s):  
Zhenhao Li

UNSTRUCTURED Tuberculosis (TB) is a precipitating cause of lung cancer. Lung cancer patients coexisting with TB is difficult to differentiate from isolated TB patients. The aim of this study is to develop a prediction model in identifying those two diseases between the comorbidities and TB. In this work, based on the laboratory data from 389 patients, 81 features, including main laboratory examination of blood test, biochemical test, coagulation assay, tumor markers and baseline information, were initially used as integrated markers and then reduced to form a discrimination system consisting of 31 top-ranked indices. Patients diagnosed with TB PCR >1mtb/ml as negative samples, lung cancer patients with TB were confirmed by pathological examination and TB PCR >1mtb/ml as positive samples. We used Spatially Uniform ReliefF (SURF) algorithm to determine feature importance, and the predictive model was built using machine learning algorithm Random Forest. For cross-validation, the samples were randomly split into four training set and one test set. The selected features are composed of four tumor markers (Scc, Cyfra21-1, CEA, ProGRP and NSE), fifteen blood biochemical indices (GLU, IBIL, K, CL, Ur, NA, TBA, CHOL, SA, TG, A/G, AST, CA, CREA and CRP), six routine blood indices (EO#, EO%, MCV, RDW-S, LY# and MPV) and four coagulation indices (APTT ratio, APTT, PTA, TT ratio). This model presented a robust and stable classification performance, which can easily differentiate the comorbidity group from the isolated TB group with AUC, ACC, sensitivity and specificity of 0.8817, 0.8654, 0.8594 and 0.8656 for the training set, respectively. Overall, this work may provide a novel strategy for identifying the TB patients with lung cancer from routine admission lab examination with advantages of being timely and economical. It also indicated that our model with enough indices may further increase the effectiveness and efficiency of diagnosis.


Author(s):  
Ting Jin ◽  
Nam D Nguyen ◽  
Flaminia Talos ◽  
Daifeng Wang

Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. Results To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. Availabilityand implementation ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2022 ◽  
Vol 11 ◽  
Author(s):  
Mingming Hu ◽  
Jinjing Tan ◽  
Zhentian Liu ◽  
Lifeng Li ◽  
Hongmei Zhang ◽  
...  

BackgroundYoung lung cancer as a small subgroup of lung cancer has not been fully studied. Most of the previous studies focused on the clinicopathological features, but studies of molecular characteristics are still few and limited. Here, we explore the characteristics of prognosis and variation in young lung cancer patients with NSCLC.MethodsA total of 5639 young lung cancer samples (NSCLC, age ≤40) were screened from the SEER and the same number of the old (NSCLC, age ≥60) were screened by propensity score matching to evaluate the prognosis of two groups. 165 treatment-naïve patients diagnosed with NSCLC were enrolled to explore the molecular feature difference between two age-varying groups. CCLE cell line expression data was used to verify the finding from the cohort of 165 patients.ResultsThe overall survival of the young lung cancer group was significantly better than the old. Germline analysis showed a trend that the young group contained a higher incidence of germline alterations. The TMB of the young group was lower. Meanwhile, the heterogeneity and evolutionary degrees of the young lung cancer group were also lower than the old. The mutation spectrums of two groups exhibited variance with LRP1B, SMARCA4, STK11, FAT2, RBM10, FANCM mutations, EGFR L858R more recurrent in the old group and EML4-ALK fusions, BCL2L11 deletion polymorphism, EGFR 19DEL, 20IN more recurrent in the young group. For the base substitution, the young showed a lower fraction of transversion. Further, we performed a pathway analysis and found the EGFR tyrosine kinase inhibitor resistance pathway enriched in the young lung cancer group, which was validated in gene expression data later.ConclusionsThere were significantly different molecular features of the young lung cancer group. The young lung cancer group had a more simple alteration structure. Alteration spectrums and base substitution types varied between two groups, implying the different pathogenesis. The young lung cancer group had more potential treatment choices. Although young lung patients had better outcomes, there were still adverse factors of them, suggesting that the young group still needs more caution for treatment choice and monitoring after the treatment to further improve the prognosis.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e20014-e20014
Author(s):  
Bo Cheng ◽  
Cong Wang ◽  
Xue Meng

e20014 Background: Nomograms are commonly used tools to estimate prognosis in oncology and medicine.We aimed to establish a nomogram with patients’ characteristics and all available hematological biomarkers for lung cancer patients. Methods: All indexes were cataloged according to clinical significance. Principle component analysis (PCA) was used to reduce the dimensions. Each component was transformed into categorical variables based on recognized cut-off values from receiver operating characteristic (ROC) curve. Kaplan-Meier analysis with log-rank test was used to evaluate the prognostic value of each component. Multivariate analysis was used to determine the promising prognostic biomarkers. Five components were entered into a predictive nomogram. The model was subjected to bootstrap internal validation and to external validation with a separate cohort from Shandong Cancer Hospital. The predictive accuracy and discriminative ability were measured by concordance index (C index) and risk group stratification. Results: Two hundred thirty-six patients were retrospectively analyzed in this study, with 134 in the Discovery Group and 102 in the Validation Group. Forty-seven indexes were sorted into 8 subgroups, and 20 principle components were extracted for further survival analysis. Via cox regression analysis, five components were significant and entered into predictive nomograms. The calibration curves for probability of 3-, and 5-year overall survival (OS) showed optimal agreement between nomogram prediction and actual observation. The new scoring system according to nomogram allowed significant distinction between survival curves within respective tumor-node-metastasis (TNM) subgroups. Conclusions: A nomogram based on the clinical indexes was established for survival prediction of lung cancer patients, which can be used for treatment therapy selection and clinical care option. PCA makes big data analysis feasible.


Sign in / Sign up

Export Citation Format

Share Document