scholarly journals The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data

2019 ◽  
Author(s):  
Laurence de Torrenté ◽  
Samuel Zimmerman ◽  
Masako Suzuki ◽  
Maximilian Christopeit ◽  
John M. Greally ◽  
...  

AbstractIn genomics, we often impose the assumption that gene expression data follows a specific distribution. However, rarely do we stop to question this assumption or consider its applicability to all genes in the transcriptome. Our study investigated the prevalence of genes with expression distributions that are non-Normal in three different tumor types from the Cancer Genome Atlas (TCGA). Surprisingly, less than 50% of all genes were Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal were represented. Relevant information about cancer biology was captured by the genes with non-Normal gene expression. When used for classification, the set of non-Normal genes were able to discriminate between cancer patients with poor versus good survival status. Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort.

2020 ◽  
Vol 21 (S21) ◽  
Author(s):  
Laurence de Torrenté ◽  
Samuel Zimmerman ◽  
Masako Suzuki ◽  
Maximilian Christopeit ◽  
John M. Greally ◽  
...  

Abstract Background In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). Results Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. Conclusions Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort.


2008 ◽  
Vol 36 (4) ◽  
pp. 708-711 ◽  
Author(s):  
Laura Smith

Post-transcriptional regulation, via 5′-UTRs (5′-untranslated regions), plays an important role in the control of eukaryotic gene expression. Recent analyses of the mammalian transcriptome suggest that most of the genes express multiple alternative 5′-UTRs and inappropriate expression of these regions has been shown to contribute to the development of carcinogenesis. The present review will focus on the complex post-transcriptional regulation of ERβ (oestrogen receptor β) expression. In particular, results from our laboratory suggest that the expression of alternative 5′-UTRs plays a key role in determining the level of ERβ protein expression. We have also shown that these alternative ERβ 5′-UTRs have a tissue-specific distribution and are differentially expressed between various normal and tumour tissues. Our results also suggest that alternative 5′-UTRs can influence downstream splicing events, thereby perhaps affecting ERβ function. These results suggest that alternative 5′-UTRs may have an overall influence on ER activity and this may have important implications for our understanding of cancer biology and treatment.


Author(s):  
Margarita Kirienko ◽  
Martina Sollini ◽  
Marinella Corbetta ◽  
Emanuele Voulaz ◽  
Noemi Gozzi ◽  
...  

Abstract Objective The objectives of our study were to assess the association of radiomic and genomic data with histology and patient outcome in non-small cell lung cancer (NSCLC). Methods In this retrospective single-centre observational study, we selected 151 surgically treated patients with adenocarcinoma or squamous cell carcinoma who performed baseline [18F] FDG PET/CT. A subgroup of patients with cancer tissue samples at the Institutional Biobank (n = 74/151) was included in the genomic analysis. Features were extracted from both PET and CT images using an in-house tool. The genomic analysis included detection of genetic variants, fusion transcripts, and gene expression. Generalised linear model (GLM) and machine learning (ML) algorithms were used to predict histology and tumour recurrence. Results Standardised uptake value (SUV) and kurtosis (among the PET and CT radiomic features, respectively), and the expression of TP63, EPHA10, FBN2, and IL1RAP were associated with the histotype. No correlation was found between radiomic features/genomic data and relapse using GLM. The ML approach identified several radiomic/genomic rules to predict the histotype successfully. The ML approach showed a modest ability of PET radiomic features to predict relapse, while it identified a robust gene expression signature able to predict patient relapse correctly. The best-performing ML radiogenomic rule predicting the outcome resulted in an area under the curve (AUC) of 0.87. Conclusions Radiogenomic data may provide clinically relevant information in NSCLC patients regarding the histotype, aggressiveness, and progression. Gene expression analysis showed potential new biomarkers and targets valuable for patient management and treatment. The application of ML allows to increase the efficacy of radiogenomic analysis and provides novel insights into cancer biology.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 854
Author(s):  
Yishu Wang ◽  
Lingyun Xu ◽  
Dongmei Ai

DNA methylation is an important regulator of gene expression that can influence tumor heterogeneity and shows weak and varying expression levels among different genes. Gastric cancer (GC) is a highly heterogeneous cancer of the digestive system with a high mortality rate worldwide. The heterogeneous subtypes of GC lead to different prognoses. In this study, we explored the relationships between DNA methylation and gene expression levels by introducing a sparse low-rank regression model based on a GC dataset with 375 tumor samples and 32 normal samples from The Cancer Genome Atlas database. Differences in the DNA methylation levels and sites were found to be associated with differences in the expressed genes related to GC development. Overall, 29 methylation-driven genes were found to be related to the GC subtypes, and in the prognostic model, we explored five prognoses related to the methylation sites. Finally, based on a low-rank matrix, seven subgroups were identified with different methylation statuses. These specific classifications based on DNA methylation levels may help to account for heterogeneity and aid in personalized treatments.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ji Li ◽  
Chen Zhu ◽  
Peipei Yue ◽  
Tianyu Zheng ◽  
Yan Li ◽  
...  

Abstract Background Abnormal energy metabolism is one of the characteristics of tumor cells, and it is also a research hotspot in recent years. Due to the complexity of digestive system structure, the frequency of tumor is relatively high. We aim to clarify the prognostic significance of energy metabolism in digestive system tumors and the underlying mechanisms. Methods Gene set variance analysis (GSVA) R package was used to establish the metabolic score, and the score was used to represent the metabolic level. The relationship between the metabolism and prognosis of digestive system tumors was explored using the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Volcano plots and gene ontology (GO) analyze were used to show different genes and different functions enriched between different glycolysis levels, and GSEA was used to analyze the pathway enrichment. Nomogram was constructed by R package based on gene characteristics and clinical parameters. qPCR and Western Blot were applied to analyze gene expression. All statistical analyses were conducted using SPSS, GraphPad Prism 7, and R software. All validated experiments were performed three times independently. Results High glycolysis metabolism score was significantly associated with poor prognosis in pancreatic adenocarcinoma (PAAD) and liver hepatocellular carcinoma (LIHC). The STAT3 (signal transducer and activator of transcription 3) and YAP1 (Yes1-associated transcriptional regulator) pathways were the most critical signaling pathways in glycolysis modulation in PAAD and LIHC, respectively. Interestingly, elevated glycolysis levels could also enhance STAT3 and YAP1 activity in PAAD and LIHC cells, respectively, forming a positive feedback loop. Conclusions Our results may provide new insights into the indispensable role of glycolysis metabolism in digestive system tumors and guide the direction of future metabolism–signaling target combined therapy.


2021 ◽  
Author(s):  
Otília Menyhárt ◽  
János Tibor Fekete ◽  
Balázs Győrffy

Abstract Despite advances in molecular characterization of glioblastoma multiforme (GBM), only a handful of predictive biomarkers exist with limited clinical relevance. We aimed to identify differentially expressed genes in tumor samples collected at surgery associated with response to subsequent treatment, including temozolomide (TMZ) and nitrosoureas. Gene expression was collected from multiple independent datasets. Patients were categorized as responders/nonresponders based on their survival status at 16 months post-surgery. For each gene, the expression was compared between responders and nonresponders with a Mann-Whitney U test and receiver operating characteristic. The package "roc" was used to calculate the area under the curve (AUC). The integrated database comprises 454 GBM patients from three independent datasets and 10,103 genes. The highest proportion of responders (68%) were among patients treated with TMZ combined with nitrosoureas, where FCGR2B upregulation provided the strongest predictive value (AUC=0.72, p < 0.001). Elevated expression of CSTA and MRPS17 was associated with a lack of response to multiple treatment strategies. DLL3 upregulation was present in subsequent responders to any treatment combination containing TMZ. Three genes (PLSCR1, MX1, and MDM2) upregulated both in the younger cohort and in patients expressing low MGMT delineate a subset of patients with worse prognosis within a population generally associated with a favorable outcome. The identified transcriptomic changes provide biomarkers of responsiveness, offer avenues for preclinical studies, and may enhance future GBM patient stratifications. The described methodology provides a reliable pipeline for the initial testing of potential biomarker candidates for future validation studies.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Wanting Song ◽  
Yi Bai ◽  
Jialin Zhu ◽  
Fanxin Zeng ◽  
Chunmeng Yang ◽  
...  

Abstract Background Gastric cancer (GC) represents a major malignancy and is the third deathliest cancer globally. Several lines of evidence indicate that the epithelial-mesenchymal transition (EMT) has a critical function in the development of gastric cancer. Although plentiful molecular biomarkers have been identified, a precise risk model is still necessary to help doctors determine patient prognosis in GC. Methods Gene expression data and clinical information for GC were acquired from The Cancer Genome Atlas (TCGA) database and 200 EMT-related genes (ERGs) from the Molecular Signatures Database (MSigDB). Then, ERGs correlated with patient prognosis in GC were assessed by univariable and multivariable Cox regression analyses. Next, a risk score formula was established for evaluating patient outcome in GC and validated by survival and ROC curves. In addition, Kaplan-Meier curves were generated to assess the associations of the clinicopathological data with prognosis. And a cohort from the Gene Expression Omnibus (GEO) database was used for validation. Results Six EMT-related genes, including CDH6, COL5A2, ITGAV, MATN3, PLOD2, and POSTN, were identified. Based on the risk model, GC patients were assigned to the high- and low-risk groups. The results revealed that the model had good performance in predicting patient prognosis in GC. Conclusions We constructed a prognosis risk model for GC. Then, we verified the performance of the model, which may help doctors predict patient prognosis.


Cancers ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 158
Author(s):  
Valentina Condelli ◽  
Giovanni Calice ◽  
Alessandra Cassano ◽  
Michele Basso ◽  
Maria Grazia Rodriquenz ◽  
...  

Epigenetics is involved in tumor progression and drug resistance in human colorectal carcinoma (CRC). This study addressed the hypothesis that the DNA methylation profiling may predict the clinical behavior of metastatic CRCs (mCRCs). The global methylation profile of two human mCRC subgroups with significantly different outcome was analyzed and compared with gene expression and methylation data from The Cancer Genome Atlas COlon ADenocarcinoma (TCGA COAD) and the NCBI GENE expression Omnibus repository (GEO) GSE48684 mCRCs datasets to identify a prognostic signature of functionally methylated genes. A novel epigenetic signature of eight hypermethylated genes was characterized that was able to identify mCRCs with poor prognosis, which had a CpG-island methylator phenotype (CIMP)-high and microsatellite instability (MSI)-like phenotype. Interestingly, methylation events were enriched in genes located on the q-arm of chromosomes 13 and 20, two chromosomal regions with gain/loss alterations associated with adenoma-to-carcinoma progression. Finally, the expression of the eight-genes signature and MSI-enriching genes was confirmed in oxaliplatin- and irinotecan-resistant CRC cell lines. These data reveal that the hypermethylation of specific genes may provide prognostic information that is able to identify a subgroup of mCRCs with poor prognosis.


Sign in / Sign up

Export Citation Format

Share Document