scholarly journals Gene prediction in heterogeneous cancer tissues and establishment of Least Absolute Shrinking and Selection Operator model of lung squamous cell carcinoma

2019 ◽  
Author(s):  
Ateeq Muhammed Khaliq ◽  
SharathChandra Rg ◽  
Meenakshi Rajamohan

AbstractBackgroundThis study is aimed to establish a Least Absolute Shrinking and Selection Operator (LASSO) model based on tumor heterogeneity to predict the best features of LUSC in various cancer subtypes.MethodsThe RNASeq data of 505 LUSC cancer samples were downloaded from the TCGA database. Subsequent to the identification of differentially expressed genes (DEGs), the samples were divided into two subtypes based on the consensus clustering method. The subtypes were estimated with the abundance of immune and non-immune stromal cell populations which infiltrated tissue. LASSO model was established to predict each subtype’s best genes. Enrichment pathway analysis was then carried out. Finally, the validity of the LUSC model for identifying features was established by the survival analysis.Results240 and 262 samples were clustered in Subtype-1 and Subtype-2 groups respectively. DEG analysis was performed on each subtype. A standard cutoff was applied and in total, 4586 genes were upregulated and 1495 were downregulated in case of subtype-1 and 5016 genes were upregulated and 3224 were downregulated in case of subtype-2. LASSO model was established to predict the best features from each subtypes, 49 and 34 most relevant genes were selected in subtype-1 and subtype-2. The abundance of tissue-infiltrates analysis distinguished the subtypes based on the expression pattern of immune infiltrates. Survival analysis showed that this model could effectively predict the best and distinct features in cancer subtypes.DiscussionThis study suggests that the unsupervised clustering and LASSO model-based feature selection can be effectively used to predict relevant genes which might play an important role in cancer diagnosis.


2019 ◽  
Author(s):  
Ateeq Muhammed Khaliq ◽  
RG Sharathchandra ◽  
Meenakshi Rajamohan

AbstractThis study aims to create a tumor heterogeneity-based model for predicting the best features of lung adenocarcinoma (LUAD) in multiple cancer subtypes using the Least Absolute Shrinking and Selection Operator (LASSO). The RNA-Seq raw count data of 533 LUAD samples and 59 normal samples were downloaded from the TCGA data portal. Based on consensus clustering method samples was divided into two subtypes, and clusters were validated using silhouette width. Furthermore, we estimated subtypes for the abundance of immune and non-immune stromal cell populations which infiltrated cancer tissue. We established the LASSO model for predicting each subtype’s best features. Enrichment pathway analysis was then carried out. Finally, the validity of the LASSO model for identifying features was established by the survival analysis. Our study suggests that the unsupervised clustering and Machine learning methods such as LASSO model-based feature selection can be effectively used to predict relevant genes which might play an essential role in cancer diagnosis.



Author(s):  
Enchong Zhang ◽  
Fujisawa Shiori ◽  
Oscar YongNan Mu ◽  
Jieqian He ◽  
Yuntian Ge ◽  
...  

Prostate cancer (PCa) is the most common malignant tumor affecting males worldwide. The substantial heterogeneity in PCa presents a major challenge with respect to molecular analyses, patient stratification, and treatment. Least absolute shrinkage and selection operator was used to select eight risk-CpG sites. Using an unsupervised clustering analysis, called consensus clustering, we found that patients with PCa could be divided into two subtypes (Methylation_H and Methylation_L) based on the DNA methylation status at these CpG sites. Differences in the epigenome, genome, transcriptome, disease status, immune cell composition, and function between the identified subtypes were explored using The Cancer Genome Atlas database. This analysis clearly revealed the risk characteristics of the Methylation_H subtype. Using a weighted correlation network analysis to select risk-related genes and least absolute shrinkage and selection operator, we constructed a prediction signature for prognosis based on the subtype classification. We further validated its effectiveness using four public datasets. The two novel PCa subtypes and risk predictive signature developed in this study may be effective indicators of prognosis.



2021 ◽  
Vol 28 ◽  
pp. 107327482098851
Author(s):  
Zeng-Hong Wu ◽  
Yun Tang ◽  
Yan Zhou

Background: Epigenetic changes are tightly linked to tumorigenesis development and malignant transformation’ However, DNA methylation occurs earlier and is constant during tumorigenesis. It plays an important role in controlling gene expression in cancer cells. Methods: In this study, we determining the prognostic value of molecular subtypes based on DNA methylation status in breast cancer samples obtained from The Cancer Genome Atlas database (TCGA). Results: Seven clusters and 204 corresponding promoter genes were identified based on consensus clustering using 166 CpG sites that significantly influenced survival outcomes. The overall survival (OS) analysis showed a significant prognostic difference among the 7 groups (p<0.05). Finally, a prognostic model was used to estimate the results of patients on the testing set based on the classification findings of a training dataset DNA methylation subgroups. Conclusions: The model was found to be important in the identification of novel biomarkers and could be of help to patients with different breast cancer subtypes when predicting prognosis, clinical diagnosis and management.



2021 ◽  
Author(s):  
Xiaokai Yan ◽  
Chiying Xiao ◽  
Kunyan Yue ◽  
Min Chen ◽  
Hang Zhou

Abstract Background: Change in the genome plays a crucial role in cancerogenesis and many biomarkers can be used as effective prognostic indicators in diverse tumors. Currently, although many studies have constructed some predictive models for hepatocellular carcinoma (HCC) based on molecular signatures, the performance of which is unsatisfactory. To fill this shortcoming, we hope to construct a novel and accurate prognostic model with multi-omics data to guide prognostic assessments of HCC. Methods: The TCGA training set was used to identify crucial biomarkers and construct single-omic prognostic models through difference analysis, univariate Cox, and LASSO/stepwise Cox analysis. Then the performances of single-omic models were evaluated and validated through survival analysis, Harrell’s concordance index (C-index), and receiver operating characteristic (ROC) curve, in the TCGA test set and external cohorts. Besides, a comprehensive model based on multi-omics data was constructed via multiple Cox analysis, and the performance of which was evaluated in the TCGA training set and TCGA test set. Results: We identified 16 key mRNAs, 20 key lncRNAs, 5 key miRNAs, 5 key CNV genes, and 7 key SNPs which were significantly associated with the prognosis of HCC, and constructed 5 single-omic models which showed relatively good performance in prognostic prediction with c-index ranged from 0.63 to 0.75 in the TCGA training set and test set. Besides, we validated the mRNA model and the SNP model in two independent external datasets respectively, and good discriminating abilities were observed through survival analysis (P < 0.05). Moreover, the multi-omics model based on mRNA, lncRNA, miRNA, CNV, and SNP information presented a quite strong predictive ability with c-index over 0.80 and all AUC values at 1,3,5-years more than 0.84.Conclusion: In this study, we identified many biomarkers that may help study underlying carcinogenesis mechanisms in HCC, and constructed five single-omic models and an integrated multi-omics model that may provide effective and reliable guides for prognosis assessment and treatment decision-making.



2019 ◽  
Vol 21 (5) ◽  
pp. 1818-1824 ◽  
Author(s):  
Qi Zhao ◽  
Yu Sun ◽  
Zekun Liu ◽  
Hongwan Zhang ◽  
Xingyang Li ◽  
...  

Abstract   Unsupervised clustering of high-throughput gene expression data is widely adopted for cancer subtyping. However, cancer subtypes derived from a single dataset are usually not applicable across multiple datasets from different platforms. Merging different datasets is necessary to determine accurate and applicable cancer subtypes but is still embarrassing due to the batch effect. CrossICC is an R package designed for the unsupervised clustering of gene expression data from multiple datasets/platforms without the requirement of batch effect adjustment. CrossICC utilizes an iterative strategy to derive the optimal gene signature and cluster numbers from a consensus similarity matrix generated by consensus clustering. This package also provides abundant functions to visualize the identified subtypes and evaluate subtyping performance. We expected that CrossICC could be used to discover the robust cancer subtypes with significant translational implications in personalized care for cancer patients. Availability and Implementation The package is implemented in R and available at GitHub (https://github.com/bioinformatist/CrossICC) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/CrossICC.html) under the GPL v3 License.



2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Jie Zhu ◽  
Min Wang ◽  
Daixing Hu

Lung cancer is the most commonly diagnosed cancer and the leading cause of cancer-related death. Among these, lung adenocarcinoma (LUAD) accounts for most cases. Due to the improvement of precision medicine based on molecular characterization, the treatment of LUAD underwent significant changes. With these changes, the prognosis of LUAD becomes diverse. N6-methyladenosine (m6A) is the most predominant modification in mRNAs, which has been a research hotspot in the field of oncology. Nevertheless, little has been studied to reveal the correlations between the m6A-related genes and prognosis in LUAD. Thus, we conducted a comprehensive analysis of m6A-related gene expressions in LUAD patients based on The Cancer Genome Atlas (TCGA) database by revealing their relationship with prognosis. Different expressions of the m6A-related genes in tumor tissues and non-tumor tissues were confirmed. Furthermore, their relationship with prognosis was studied via Consensus Clustering Analysis, Principal Components Analysis (PCA), and Least Absolute Shrinkage and Selection Operator (LASSO) Regression. Based on the above analyses, a m6A-based signature to predict the overall survival (OS) in LUAD was successfully established. Among the 479 cases, we found that most of the m6A-related genes were differentially expressed between tumor and non-tumor tissues. Six genes, HNRNPC, METTL3, YTHDC2, KIAA1429, ALKBH5, and YTHDF1 were screened to build a risk scoring signature, which is strongly related to the clinical features pathological stages (p<0.05), M stages (p<0.05), T stages (p < 0.05), gender (p=0.04), and survival outcome (p=0.02). Multivariate Cox analysis indicated that risk value could be used as an independent prognostic factor, revealing that the m6A-related genes signature has great predictive value. Its efficacy was also validated by data from the Gene Expression Omnibus (GEO) database.



2019 ◽  
Vol 35 (14) ◽  
pp. i484-i491
Author(s):  
Jakob Richter ◽  
Katrin Madjar ◽  
Jörg Rahnenführer

AbstractMotivationTo obtain a reliable prediction model for a specific cancer subgroup or cohort is often difficult due to limited sample size and, in survival analysis, due to potentially high censoring rates. Sometimes similar data from other patient subgroups are available, e.g. from other clinical centers. Simple pooling of all subgroups can decrease the variance of the predicted parameters of the prediction models, but also increase the bias due to heterogeneity between the cohorts. A promising compromise is to identify those subgroups with a similar relationship between covariates and target variable and then include only these for model building.ResultsWe propose a subgroup-based weighted likelihood approach for survival prediction with high-dimensional genetic covariates. When predicting survival for a specific subgroup, for every other subgroup an individual weight determines the strength with which its observations enter into model building. MBO (model-based optimization) can be used to quickly find a good prediction model in the presence of a large number of hyperparameters. We use MBO to identify the best model for survival prediction of a specific subgroup by optimizing the weights for additional subgroups for a Cox model. The approach is evaluated on a set of lung cancer cohorts with gene expression measurements. The resulting models have competitive prediction quality, and they reflect the similarity of the corresponding cancer subgroups, with both weights close to 0 and close to 1 and medium weights.Availability and implementationmlrMBO is implemented as an R-package and is freely available at http://github.com/mlr-org/mlrMBO.



2022 ◽  
Author(s):  
Xiaokai Yan ◽  
Chiying Xiao ◽  
Kunyan Yue ◽  
Min Chen ◽  
Hang Zhou ◽  
...  

Abstract Background: Change in the genome plays a crucial role in cancerogenesis and many biomarkers can be used as effective prognostic indicators in diverse tumors. Currently, although many studies have constructed some predictive models for hepatocellular carcinoma (HCC) based on molecular signatures, the performance of which is unsatisfactory. To fill this shortcoming, we hope to construct a novel and accurate prognostic model with multi-omics data to guide prognostic assessments of HCC. Methods: The TCGA training set was used to identify crucial biomarkers and construct single-omic prognostic models through difference analysis, univariate Cox, and LASSO/stepwise Cox analysis. Then the performances of single-omic models were evaluated and validated through survival analysis, Harrell’s concordance index (C-index), and receiver operating characteristic (ROC) curve, in the TCGA test set and external cohorts. Besides, a comprehensive model based on multi-omics data was constructed via multiple Cox analysis, and the performance of which was evaluated in the TCGA training set and TCGA test set. Results: We identified 16 key mRNAs, 20 key lncRNAs, 5 key miRNAs, 5 key CNV genes, and 7 key SNPs which were significantly associated with the prognosis of HCC, and constructed 5 single-omic models which showed relatively good performance in prognostic prediction with c-index ranged from 0.63 to 0.75 in the TCGA training set and test set. Besides, we validated the mRNA model and the SNP model in two independent external datasets respectively, and good discriminating abilities were observed through survival analysis (P < 0.05). Moreover, the multi-omics model based on mRNA, lncRNA, miRNA, CNV, and SNP information presented a quite strong predictive ability with c-index over 0.80 and all AUC values at 1,3,5-years more than 0.84.Conclusion: In this study, we identified many biomarkers that may help study underlying carcinogenesis mechanisms in HCC, and constructed five single-omic models and an integrated multi-omics model that may provide effective and reliable guides for prognosis assessment and treatment decision-making.



Sign in / Sign up

Export Citation Format

Share Document