scholarly journals Prediction of best features in heterogeneous Lung adenocarcinoma samples using Least Absolute Shrinking and Selection Operator

2019 ◽  
Author(s):  
Ateeq Muhammed Khaliq ◽  
RG Sharathchandra ◽  
Meenakshi Rajamohan

AbstractThis study aims to create a tumor heterogeneity-based model for predicting the best features of lung adenocarcinoma (LUAD) in multiple cancer subtypes using the Least Absolute Shrinking and Selection Operator (LASSO). The RNA-Seq raw count data of 533 LUAD samples and 59 normal samples were downloaded from the TCGA data portal. Based on consensus clustering method samples was divided into two subtypes, and clusters were validated using silhouette width. Furthermore, we estimated subtypes for the abundance of immune and non-immune stromal cell populations which infiltrated cancer tissue. We established the LASSO model for predicting each subtype’s best features. Enrichment pathway analysis was then carried out. Finally, the validity of the LASSO model for identifying features was established by the survival analysis. Our study suggests that the unsupervised clustering and Machine learning methods such as LASSO model-based feature selection can be effectively used to predict relevant genes which might play an essential role in cancer diagnosis.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Jie Zhu ◽  
Min Wang ◽  
Daixing Hu

Lung cancer is the most commonly diagnosed cancer and the leading cause of cancer-related death. Among these, lung adenocarcinoma (LUAD) accounts for most cases. Due to the improvement of precision medicine based on molecular characterization, the treatment of LUAD underwent significant changes. With these changes, the prognosis of LUAD becomes diverse. N6-methyladenosine (m6A) is the most predominant modification in mRNAs, which has been a research hotspot in the field of oncology. Nevertheless, little has been studied to reveal the correlations between the m6A-related genes and prognosis in LUAD. Thus, we conducted a comprehensive analysis of m6A-related gene expressions in LUAD patients based on The Cancer Genome Atlas (TCGA) database by revealing their relationship with prognosis. Different expressions of the m6A-related genes in tumor tissues and non-tumor tissues were confirmed. Furthermore, their relationship with prognosis was studied via Consensus Clustering Analysis, Principal Components Analysis (PCA), and Least Absolute Shrinkage and Selection Operator (LASSO) Regression. Based on the above analyses, a m6A-based signature to predict the overall survival (OS) in LUAD was successfully established. Among the 479 cases, we found that most of the m6A-related genes were differentially expressed between tumor and non-tumor tissues. Six genes, HNRNPC, METTL3, YTHDC2, KIAA1429, ALKBH5, and YTHDF1 were screened to build a risk scoring signature, which is strongly related to the clinical features pathological stages (p<0.05), M stages (p<0.05), T stages (p < 0.05), gender (p=0.04), and survival outcome (p=0.02). Multivariate Cox analysis indicated that risk value could be used as an independent prognostic factor, revealing that the m6A-related genes signature has great predictive value. Its efficacy was also validated by data from the Gene Expression Omnibus (GEO) database.



2019 ◽  
Author(s):  
Ateeq Muhammed Khaliq ◽  
SharathChandra Rg ◽  
Meenakshi Rajamohan

AbstractBackgroundThis study is aimed to establish a Least Absolute Shrinking and Selection Operator (LASSO) model based on tumor heterogeneity to predict the best features of LUSC in various cancer subtypes.MethodsThe RNASeq data of 505 LUSC cancer samples were downloaded from the TCGA database. Subsequent to the identification of differentially expressed genes (DEGs), the samples were divided into two subtypes based on the consensus clustering method. The subtypes were estimated with the abundance of immune and non-immune stromal cell populations which infiltrated tissue. LASSO model was established to predict each subtype’s best genes. Enrichment pathway analysis was then carried out. Finally, the validity of the LUSC model for identifying features was established by the survival analysis.Results240 and 262 samples were clustered in Subtype-1 and Subtype-2 groups respectively. DEG analysis was performed on each subtype. A standard cutoff was applied and in total, 4586 genes were upregulated and 1495 were downregulated in case of subtype-1 and 5016 genes were upregulated and 3224 were downregulated in case of subtype-2. LASSO model was established to predict the best features from each subtypes, 49 and 34 most relevant genes were selected in subtype-1 and subtype-2. The abundance of tissue-infiltrates analysis distinguished the subtypes based on the expression pattern of immune infiltrates. Survival analysis showed that this model could effectively predict the best and distinct features in cancer subtypes.DiscussionThis study suggests that the unsupervised clustering and LASSO model-based feature selection can be effectively used to predict relevant genes which might play an important role in cancer diagnosis.



Author(s):  
Enchong Zhang ◽  
Fujisawa Shiori ◽  
Oscar YongNan Mu ◽  
Jieqian He ◽  
Yuntian Ge ◽  
...  

Prostate cancer (PCa) is the most common malignant tumor affecting males worldwide. The substantial heterogeneity in PCa presents a major challenge with respect to molecular analyses, patient stratification, and treatment. Least absolute shrinkage and selection operator was used to select eight risk-CpG sites. Using an unsupervised clustering analysis, called consensus clustering, we found that patients with PCa could be divided into two subtypes (Methylation_H and Methylation_L) based on the DNA methylation status at these CpG sites. Differences in the epigenome, genome, transcriptome, disease status, immune cell composition, and function between the identified subtypes were explored using The Cancer Genome Atlas database. This analysis clearly revealed the risk characteristics of the Methylation_H subtype. Using a weighted correlation network analysis to select risk-related genes and least absolute shrinkage and selection operator, we constructed a prediction signature for prognosis based on the subtype classification. We further validated its effectiveness using four public datasets. The two novel PCa subtypes and risk predictive signature developed in this study may be effective indicators of prognosis.



2018 ◽  
Vol 11 (S6) ◽  
Author(s):  
Yanglan Gan ◽  
Ning Li ◽  
Guobing Zou ◽  
Yongchang Xin ◽  
Jihong Guan


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Lan Mu ◽  
Ke Ding ◽  
Ranran Tu ◽  
Wei Yang

Abstract Background Lung cancer is the most common cancer and cause of cancer‐related mortality worldwide, increasing evidence indicated that there was a significant correlation between tumors and the long non‐coding RNAs (lncRNAs), as well as tumor immune infiltration, but their role in early lung adenocarcinoma (LUAD) are still unclear. Methods Gene expression data and corresponding clinical data of early-stage LUAD patients were downloaded from GEO and TCGA databases. 24 kinds of tumor-infiltrating immune cells were analyzed by quantity analysis and univariate cox regression analysis, we divided patients into two subgroups using consensus clustering, recognized the differentially expressed genes (DEGs) in the subgroups, then, established lncRNA risk signature by least absolute shrinkage and selection operator (LASSO) regression. Results A total of 718 patients were enrolled in this study, including 246 from GSE31210 dataset, 127 from GSE50081 dataset and 345 from TCGA-LUAD. We identified that Th2 cells, TFH, NK CD56dim cells and Mast cells were prognosis-related(p < 0.05), then established a 5-lncRNA risk signature (risk score = 0.374600616* LINC00857 + 0.173825706* LINC01116 + (− 0.021398903)* DRAIC + (− 0.113658256)* LINC01140 + (− 0.008403702)* XIST), and draw a nomogram showed that the signature had a well prediction accuracy and discrimination. Conclusions We identified 4 immune infiltrating cells related to the prognosis of early-stage LUAD, and established a novel 5 immune-related lncRNA signature for predicting patients’ prognosis.



2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jing Xu ◽  
Xiangdong Liu ◽  
Qiming Dai

Abstract Background Hypertrophic cardiomyopathy (HCM) represents one of the most common inherited heart diseases. To identify key molecules involved in the development of HCM, gene expression patterns of the heart tissue samples in HCM patients from multiple microarray and RNA-seq platforms were investigated. Methods The significant genes were obtained through the intersection of two gene sets, corresponding to the identified differentially expressed genes (DEGs) within the microarray data and within the RNA-Seq data. Those genes were further ranked using minimum-Redundancy Maximum-Relevance feature selection algorithm. Moreover, the genes were assessed by three different machine learning methods for classification, including support vector machines, random forest and k-Nearest Neighbor. Results Outstanding results were achieved by taking exclusively the top eight genes of the ranking into consideration. Since the eight genes were identified as candidate HCM hallmark genes, the interactions between them and known HCM disease genes were explored through the protein–protein interaction (PPI) network. Most candidate HCM hallmark genes were found to have direct or indirect interactions with known HCM diseases genes in the PPI network, particularly the hub genes JAK2 and GADD45A. Conclusions This study highlights the transcriptomic data integration, in combination with machine learning methods, in providing insight into the key hallmark genes in the genetic etiology of HCM.



2021 ◽  
Vol 28 ◽  
pp. 107327482098851
Author(s):  
Zeng-Hong Wu ◽  
Yun Tang ◽  
Yan Zhou

Background: Epigenetic changes are tightly linked to tumorigenesis development and malignant transformation’ However, DNA methylation occurs earlier and is constant during tumorigenesis. It plays an important role in controlling gene expression in cancer cells. Methods: In this study, we determining the prognostic value of molecular subtypes based on DNA methylation status in breast cancer samples obtained from The Cancer Genome Atlas database (TCGA). Results: Seven clusters and 204 corresponding promoter genes were identified based on consensus clustering using 166 CpG sites that significantly influenced survival outcomes. The overall survival (OS) analysis showed a significant prognostic difference among the 7 groups (p<0.05). Finally, a prognostic model was used to estimate the results of patients on the testing set based on the classification findings of a training dataset DNA methylation subgroups. Conclusions: The model was found to be important in the identification of novel biomarkers and could be of help to patients with different breast cancer subtypes when predicting prognosis, clinical diagnosis and management.



Author(s):  
Hua‐Zhong Cai ◽  
Heteng Zhang ◽  
Jie Yang ◽  
Jian Zeng ◽  
Hao Wang


2021 ◽  
Author(s):  
Xue Wang ◽  
Yuetong Wang ◽  
Zhaoyuan Fang ◽  
Hua Wang ◽  
Jian Zhang ◽  
...  

Abstract Somatic mutations of the chromatin remodeling gene ARID2 are observed in about 7% of human lung adenocarcinoma (LUAD). However, the role of ARID2 in the pathogenesis of LUAD remains largely unknown. Here we find that ARID2 expression is decreased during the malignant progression of both human and mice LUAD. Using two KrasG12D-based genetically engineered murine models (GEMM), we demonstrate that ARID2 knockout significantly promotes lung cancer malignant progression and shortens the overall survival. Consistently, ARID2 knockdown significantly promotes cell proliferation in human and mice lung cancer cells. Through integrative analyses of Chip-Seq and RNA-Seq data, we find that Hspa1a is up-regulated by Arid2 loss. Knockdown of Hspa1a specifically inhibits malignant progression of Arid2-deficient but not Arid2-wt lung cancers in both cell lines as well as animal models. Treatment with Hspa1a inhibitor could significantly inhibit the malignant progression of lung cancer with Arid2 deficiency. Together, our findings establish ARID2 as an important tumor suppressor in LUAD with novel mechanistic insights, and further identify HSPA1A as a potential therapeutic target in ARID2-deficient LUAD.



2020 ◽  
Author(s):  
Rachana Garg ◽  
Mariana Cooke ◽  
Shaofei Wang ◽  
Fernando Benavides ◽  
Martin C. Abba ◽  
...  

ABSTRACTNon-small cell lung cancer (NSCLC), the most frequent subtype of lung cancer, remains a highly lethal malignancy and one of the leading causes of cancer deaths worldwide. Mutant KRAS is the prevailing oncogenic driver of lung adenocarcinoma, the most common histological form of NSCLC. In this study, we examined the role of PKCε, an oncogenic kinase highly expressed in NSCLC and other cancers, in KRAS-driven tumorigenesis. Notably, database analysis revealed an association between PKCε expression and poor outcome in lung adenocarcinoma patients specifically having KRAS mutation. By generating a PKCε-deficient, conditionally activatable allele of oncogenic Kras (LSL-KrasG12D;PKCε−/− mice) we were able to demonstrate the requirement of PKCε for Kras-driven lung tumorigenesis in vivo, which is consistent with the impaired transformed growth observed in PKCε-deficient KRAS-dependent NSCLC cells. Moreover, PKCε-knockout mice were found to be less susceptible to lung tumorigenesis induced by benzo[a]pyrene, a carcinogen that induces mutations in Kras. Mechanistic analysis using RNA-Seq revealed little overlapping for PKCε and KRAS in the control of genes/biological pathways relevant in NSCLC, suggesting that a permissive role of PKCε in KRAS-driven lung tumorigenesis may involve non-redundant mechanisms. Our results thus highlight the relevance and potential of targeting PKCε for lung cancer therapeutics.



Sign in / Sign up

Export Citation Format

Share Document