scholarly journals ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages

Author(s):  
Ting Jin ◽  
Nam D Nguyen ◽  
Flaminia Talos ◽  
Daifeng Wang

Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. Results To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. Availabilityand implementation ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Ting Jin ◽  
Nam D. Nguyen ◽  
Flaminia Talos ◽  
Daifeng Wang

AbstractGene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a “black box”, barely providing biological and clinical interpretability from the box. To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative- restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small cell lung cancer (NSCLC) patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. Additionally, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (p-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker.


2020 ◽  
Author(s):  
Zhiyu Wang ◽  
Jing Sun ◽  
Yi Sun ◽  
Yifeng Gu ◽  
Yongming Xu ◽  
...  

Abstract Background: As life expectancy increases for lung cancer patients who develop bone metastases, the need for personalized local treatment for bone metastases is expanding.Methods: Lung cancer patients with bone metastases were treated by a multidisciplinary team via surgery, percutaneous osteoplasty, or radiation. The pre- and post-treatment visual analog scale (VAS) and Quality of Life (QoL) scores were analyzed. QoL at 12 weeks was the main outcome. Treatment-related costs and overall survival time (OS) were collected. We used machine learning to develop and test models to predict which patients should receive local treatment. Models discrimination were evaluated by the area under curve (AUC), and the best one was used for validation in clinical use. Results: Under the direction of a multidisciplinary team, 161 patients in the training set, and 32 patients in the test set underwent local treatment. A decision tree model included VAS scale, bone metastases character, Frankel classification, Mirels score, age, driver gene, aldehyde dehydrogenase 2, and enolase 1 expression had a best AUC of 0.92 (95%CI 0.89 to 0.94), and 36 patients in a validation set underwent local treatment guided by the model. Improved QoL and VAS scores were observed at 12 weeks after local treatment in training, test, and validation sets (p < 0.05), with no significant differences among the three datasets. There were no significant differences in mean costs among the three datasets in the four treatment groups. OS was 18.03±0.45 months and did not significantly differ among treatment groups or the three datasets. Conclusions: Local treatment not only had no negative influence on OS but also provided significant pain relief and improved QoL. QoL, OS or costs did not significantly differ between patients whose treatment was guided by a multidisciplinary team or machine learning model. Our machine learning model using clinical data can help guide clinicians to make local treatment decisions to improve patients’ QoL.Trial registration: No. ChiCRT-ROC-16009501


2021 ◽  
Author(s):  
Zhenhao Li

UNSTRUCTURED Tuberculosis (TB) is a precipitating cause of lung cancer. Lung cancer patients coexisting with TB is difficult to differentiate from isolated TB patients. The aim of this study is to develop a prediction model in identifying those two diseases between the comorbidities and TB. In this work, based on the laboratory data from 389 patients, 81 features, including main laboratory examination of blood test, biochemical test, coagulation assay, tumor markers and baseline information, were initially used as integrated markers and then reduced to form a discrimination system consisting of 31 top-ranked indices. Patients diagnosed with TB PCR >1mtb/ml as negative samples, lung cancer patients with TB were confirmed by pathological examination and TB PCR >1mtb/ml as positive samples. We used Spatially Uniform ReliefF (SURF) algorithm to determine feature importance, and the predictive model was built using machine learning algorithm Random Forest. For cross-validation, the samples were randomly split into four training set and one test set. The selected features are composed of four tumor markers (Scc, Cyfra21-1, CEA, ProGRP and NSE), fifteen blood biochemical indices (GLU, IBIL, K, CL, Ur, NA, TBA, CHOL, SA, TG, A/G, AST, CA, CREA and CRP), six routine blood indices (EO#, EO%, MCV, RDW-S, LY# and MPV) and four coagulation indices (APTT ratio, APTT, PTA, TT ratio). This model presented a robust and stable classification performance, which can easily differentiate the comorbidity group from the isolated TB group with AUC, ACC, sensitivity and specificity of 0.8817, 0.8654, 0.8594 and 0.8656 for the training set, respectively. Overall, this work may provide a novel strategy for identifying the TB patients with lung cancer from routine admission lab examination with advantages of being timely and economical. It also indicated that our model with enough indices may further increase the effectiveness and efficiency of diagnosis.


2020 ◽  
Vol 9 (3) ◽  
pp. 682-692
Author(s):  
Iris Kamer ◽  
Yael Steuerman ◽  
Inbal Daniel-Meshulam ◽  
Gili Perry ◽  
Shai Izraeli ◽  
...  

2009 ◽  
Vol 27 (15_suppl) ◽  
pp. e19072-e19072
Author(s):  
A. Irigoyen ◽  
C. Olmedo ◽  
J. Valdivia ◽  
A. Comino ◽  
C. Cano ◽  
...  

e19072 Background: The gene expression profile in peripheral blood samples from lung cancer patients is a potential predictor to treatment response. Methods: The study has been developed using 10 healthy volunteers as the control group and 10 lung cancer patients (stage IV). Written informed consent was obtained being the protocol approved by the local Clinical Research and Ethics Committee. Peripheral blood samples were obtained from lung cancer patients before (T0) and after treatment (T15d). RNA from peripheral blood samples was extracted and purified selecting 28S/18S ratios>1.5 to obtain cDNA and cRNA for hybridization of the 20,000 genes included in Human 20K CodeLink. An array from each participant was obtained in duplicate. For each array, 2 μg of cRNA was compared to 2 μg of healthy cRNA.. Significant genes were found using Significance Analysis of Microarrays which uses repeated permutations of the data. Results: The selected genes were expressed >3-fold with a false discovery rate =0.05. Before treatment (T0) when patients were compared to healthy volunteers there was an increase in the expression of: histone 1 H4c, transforming growth factor beta 2, endothelial cell growth factor 1 (platelet-derived), glucose-6-phosphatase catalytic 2, Relaxin 3 receptor 1, Insulin-like growth factor binding protein 2, RAS-like family 11 member B, and ELK4. After treatment (T15d), when each lung cancer patient's results were compared to their own before treatment results (T0), there was an increase in the expression of: Bcl2, myosin light polypeptide 4; interferon alpha-inducible protein 27; interferon gamma receptor 1; RASSF5, ARHGEF6, IGFBP5, tumor protein p53 inducible nuclear protein 1, peroxisome proliferative activated receptor gamma. Conclusions: The data presented identifies biologically relevant over-expressed genes in lung cancer. A validation of these results and the analysis of the genes that identify patients who will respond positively to erlotinib treatment is being carried out. No significant financial relationships to disclose.


Sign in / Sign up

Export Citation Format

Share Document