scholarly journals Identification of differentially distributed gene expression and distinct sets of cancer-related genes identified by changes in mean and variability

2022 ◽  
Vol 4 (1) ◽  
Author(s):  
Aedan G K Roberts ◽  
Daniel R Catchpoole ◽  
Paul J Kennedy

ABSTRACT There is increasing evidence that changes in the variability or overall distribution of gene expression are important both in normal biology and in diseases, particularly cancer. Genes whose expression differs in variability or distribution without a difference in mean are ignored by traditional differential expression-based analyses. Using a Bayesian hierarchical model that provides tests for both differential variability and differential distribution for bulk RNA-seq data, we report here an investigation into differential variability and distribution in cancer. Analysis of eight paired tumour–normal datasets from The Cancer Genome Atlas confirms that differential variability and distribution analyses are able to identify cancer-related genes. We further demonstrate that differential variability identifies cancer-related genes that are missed by differential expression analysis, and that differential expression and differential variability identify functionally distinct sets of potentially cancer-related genes. These results suggest that differential variability analysis may provide insights into genetic aspects of cancer that would not be revealed by differential expression, and that differential distribution analysis may allow for more comprehensive identification of cancer-related genes than analyses based on changes in mean or variability alone.

2021 ◽  
Author(s):  
Aedan G. K. Roberts ◽  
Daniel R. Catchpoole ◽  
Paul J. Kennedy

AbstractBackgroundDifferential expression analysis of RNA-seq data has advanced rapidly since the introduction of the technology, and methods such as edgeR and DESeq2 have become standard parts of analysis pipelines. However, there is a growing body of research showing that differences in variability of gene expression or overall differences in the distribution of expression values – differential distribution – are also important both in normal biology and in diseases including cancer. Genes whose expression differs in distribution without a difference in mean expression level are ignored by differential expression methods.ResultsWe have developed a Bayesian hierarchical model which improves on existing methods for identifying differential dispersion in RNA-seq data, and provides an overall test for differential distribution. We have applied these methods to investigate differential dispersion and distribution in cancer using RNA-seq datasets from The Cancer Genome Atlas. Our results show that differential dispersion and distribution are able to identify cancer-related genes. Further, we find that differential dispersion identifies cancer-related genes that are missed by differential expression analysis, and that differential expression and differential dispersion identify functionally distinct sets of genes.ConclusionThis work highlights the importance of considering changes beyond differences in mean in the analysis of gene expression data, and suggests that analysis of expression variability may provide insights into genetic aspects of cancer that would not be revealed by differential expression analysis alone. For identification of cancer-related genes, differential distribution analysis allows the identification of genes whose expression is disrupted in terms of either mean or variability.


2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
Elisa C Maruko ◽  
Hao Xu ◽  
Sushma Kaul ◽  
Brian J Capaldo ◽  
Nathalie Pamir ◽  
...  

Atherosclerosis is a disease of both lipids and inflammatory immune cells. More specifically, elevated plasma levels of low-density lipoproteins (LDL) leads to migration of circulating monocytes into the artery wall. Lipid loaded monocyte cells subsequently proliferate in the arterial walls becoming macrophage foam cells; a hallmark of atherosclerotic lesions. A proposed mechanism of the protective effects of high-density lipoprotein (HDL) is apolipoprotein A-I (apo A-I) acting as a mediator of cholesterol efflux and subsequent foam cell regression. To better understand the biological changes stimulated by apo A-I treatment, differential expression analysis of microarray data was performed on spleen cells from apo A-I treated mice. LDL receptor null (LDLr -/- ) and LDL receptor and apo A-I null (LDLr -/- , apoA-I -/- ) mice were fed a western diet consisting of 0.2% cholesterol and 42% of calories as fat for 12 weeks. After 6 weeks of diet, a subset of mice for each genotype was subcutaneously injected with 200 micrograms of apo A-I 3 times a week for the remaining 6 weeks. The control group mice were subcutaneously injected with 200 micrograms of saline or BSA. Spleen cell RNA was isolated, purified, and analyzed for differential expression analysis using Illumina BeadArray Microarray Technology Analysis. Individual gene expression analysis for LDLr -/- , apoA-I -/- apo A-I treated mice showed 281 significantly differentially expressed genes compared to BSA treated mice. LDLr -/- A-I treated mice had 1502. Of the significant genes, 189 intersected across both genotypes. LDLr -/- , apoA-I -/- A-I mice showed 73 up-regulated and 116 down-regulated genes. Similarly, LDLr -/- A-I mice had 71 up-regulated and 118 down-regulated. One-directional Gene Set Enrichment Analysis (GSEA) of LDLr -/- , apoA-I -/- A-I mice revealed 49 significant pathways while a total of 63 were found for LDLr -/- . Of these pathways, 21 were up-regulated and 13 were down-regulated in both genotypes. Eight of the top 10 most significant up-regulated pathways in both genotypes were immune cell related. Their functions involve receptor, adhesion, and chemokine signaling. Overall, preliminary analysis suggests A-I treatment induces similar gene expression changes across different genotypes.


BMC Cancer ◽  
2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Heini M. Natri ◽  
Melissa A. Wilson ◽  
Kenneth H. Buetow

Abstract Background Sex-differences in cancer occurrence and mortality are evident across tumor types; men exhibit higher rates of incidence and often poorer responses to treatment. Targeted approaches to the treatment of tumors that account for these sex-differences require the characterization and understanding of the fundamental biological mechanisms that differentiate them. Hepatocellular Carcinoma (HCC) is the second leading cause of cancer death worldwide, with the incidence rapidly rising. HCC exhibits a male-bias in occurrence and mortality, but previous studies have failed to explore the sex-specific dysregulation of gene expression in HCC. Methods Here, we characterize the sex-shared and sex-specific regulatory changes in HCC tumors in the TCGA LIHC cohort using combined and sex-stratified differential expression and eQTL analyses. Results By using a sex-specific differential expression analysis of tumor and tumor-adjacent samples, we uncovered etiologically relevant genes and pathways differentiating male and female HCC. While both sexes exhibited activation of pathways related to apoptosis and cell cycle, males and females differed in the activation of several signaling pathways, with females showing PPAR pathway enrichment while males showed PI3K, PI3K/AKT, FGFR, EGFR, NGF, GF1R, Rap1, DAP12, and IL-2 signaling pathway enrichment. Using eQTL analyses, we discovered germline variants with differential effects on tumor gene expression between the sexes. 24.3% of the discovered eQTLs exhibit differential effects between the sexes, illustrating the substantial role of sex in modifying the effects of eQTLs in HCC. The genes that showed sex-specific dysregulation in tumors and those that harbored a sex-specific eQTL converge in clinically relevant pathways, suggesting that the molecular etiologies of male and female HCC are partially driven by differential genetic effects on gene expression. Conclusions Sex-stratified analyses detect sex-specific molecular etiologies of HCC. Overall, our results provide new insight into the role of inherited genetic regulation of transcription in modulating sex-differences in HCC etiology and provide a framework for future studies on sex-biased cancers.


2015 ◽  
Vol 9s3 ◽  
pp. BBI.S29470 ◽  
Author(s):  
Mikhail G. Dozmorov ◽  
Nicolas Dominguez ◽  
Krista Bean ◽  
Susan R. Macwana ◽  
Virginia Roberts ◽  
...  

Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by complex interplay among immune cell types. SLE activity is experimentally assessed by several blood tests, including gene expression profiling of heterogeneous populations of cells in peripheral blood. To better understand the contribution of different cell types in SLE pathogenesis, we applied the two methods in cell-type-specific differential expression analysis, csSAM and DSection, to identify cell-type-specific gene expression differences in heterogeneous gene expression measures obtained using RNA-seq technology. We identified B-cell-, monocyte-, and neutrophil-specific gene expression differences. Immunoglobulin-coding gene expression was altered in B-cells, while a ribosomal signature was prominent in monocytes. On the contrary, genes differentially expressed in the heterogeneous mixture of cells did not show any functional enrichment. Our results identify antigen binding and structural constituents of ribosomes as functions altered by B-cell- and monocyte-specific gene expression differences, respectively. Finally, these results position both csSAM and DSection methods as viable techniques for cell-type-specific differential expression analysis, which may help uncover pathogenic, cell-type-specific processes in SLE.


2021 ◽  
Vol 12 ◽  
Author(s):  
Dongfang Jia ◽  
Cheng Chen ◽  
Chen Chen ◽  
Fangfang Chen ◽  
Ningrui Zhang ◽  
...  

Mastering the molecular mechanism of breast cancer (BC) can provide an in-depth understanding of BC pathology. This study explored existing technologies for diagnosing BC, such as mammography, ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) and summarized the disadvantages of the existing cancer diagnosis. The purpose of this article is to use gene expression profiles of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) to classify BC samples and normal samples. The method proposed in this article triumphs over some of the shortcomings of traditional diagnostic methods and can conduct BC diagnosis more rapidly with high sensitivity and have no radiation. This study first selected the genes most relevant to cancer through weighted gene co-expression network analysis (WGCNA) and differential expression analysis (DEA). Then it used the protein–protein interaction (PPI) network to screen 23 hub genes. Finally, it used the support vector machine (SVM), decision tree (DT), Bayesian network (BN), artificial neural network (ANN), convolutional neural network CNN-LeNet and CNN-AlexNet to process the expression levels of 23 hub genes. For gene expression profiles, the ANN model has the best performance in the classification of cancer samples. The ten-time average accuracy is 97.36% (±0.34%), the F1 value is 0.8535 (±0.0260), the sensitivity is 98.32% (±0.32%), the specificity is 89.59% (±3.53%) and the AUC is 0.99. In summary, this method effectively classifies cancer samples and normal samples and provides reasonable new ideas for the early diagnosis of cancer in the future.


2019 ◽  
Author(s):  
Ashkaun Razmara ◽  
Shannon E. Ellis ◽  
Dustin J. Sokolowski ◽  
Sean Davis ◽  
Michael D. Wilson ◽  
...  

AbstractThe usability of publicly-available gene expression data is often limited by the availability of high-quality, standardized biological phenotype and experimental condition information (“metadata”). We released the recount2 project, which involved re-processing ∼70,000 samples in the Sequencing Read Archive (SRA), Genotype-Tissue Expression (GTEx), and The Cancer Genome Atlas (TCGA) projects. While samples from the latter two projects are well-characterized with extensive metadata, the ∼50,000 RNA-seq samples from SRA in recount2 are inconsistently annotated with metadata. Tissue type, sex, and library type can be estimated from the RNA sequencing (RNA-seq) data itself. However, more detailed and harder to predict metadata, like age and diagnosis, must ideally be provided by labs that deposit the data.To facilitate more analyses within human brain tissue data, we have complemented phenotype predictions by manually constructing a uniformly-curated database of public RNA-seq samples present in SRA and recount2. We describe the reproducible curation process for constructing recount-brain that involves systematic review of the primary manuscript, which can serve as a guide to annotate other studies and tissues. We further expanded recount-brain by merging it with GTEx and TCGA brain samples as well as linking to controlled vocabulary terms for tissue, Brodmann area and disease. Furthermore, we illustrate how to integrate the sample metadata in recount-brain with the gene expression data in recount2 to perform differential expression analysis. We then provide three analysis examples involving modeling postmortem interval, glioblastoma, and meta-analyses across GTEx and TCGA. Overall, recount-brain facilitates expression analyses and improves their reproducibility as individual researchers do not have to manually curate the sample metadata. recount-brain is available via the add_metadata() function from the recount Bioconductor package at bioconductor.org/packages/recount.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2010 ◽  
Author(s):  
Monther Alhamdoosh ◽  
Charity W. Law ◽  
Luyi Tian ◽  
Julie M. Sheridan ◽  
Milica Ng ◽  
...  

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.


2019 ◽  
Author(s):  
Amy Li ◽  
Bjoern Chapuy ◽  
Xaralabos Varelas ◽  
Paola Sebastiani ◽  
Stefano Monti

AbstractThe emergence of large-scale multi-omics data warrants method development for data integration. Genomic studies from cancer patients have identified epigenetic and genetic regulators – such as methylation marks, somatic mutations, and somatic copy number alterations (SCNAs), among others – as predictive features of cancer outcome. However, identification of “driver genes” associated with a given alteration remains a challenge. To this end, we developed a computational tool, iEDGE, to model cis and trans effects of (epi-)DNA alterations and identify potential cis driver genes, where cis and trans genes denote those genes falling within and outside the genomic boundaries of a given (epi-)genetic alteration, respectively.First, iEDGE identifies the cis and trans genes associated with the presence/absence of a particular epi-DNA alteration across samples. Tests of statistical mediation are then performed to determine the cis genes predictive of the trans gene expression. Finally, cis and trans effects are annotated by pathway enrichment analysis to gain insights into the underlying regulatory networks.We used iEDGE to perform integrative analysis of SCNAs and gene expression data from breast cancer and 18 additional cancer types included in The Cancer Genome Atlas (TCGA). Notably, cis gene drivers identified by iEDGE were found to be significantly enriched for known driver genes from multiple compendia of validated oncogenes and tumor suppressors, suggesting that the remainder are of equal importance. Furthermore, predicted drivers were enriched for functionally relevant cancer genes with amplification-driven dependencies, which are of potential prognostic and therapeutic value. All the analyses results are accessible athttps://montilab.bu.edu/iEDGE.


2021 ◽  
Author(s):  
Zhongze Cui ◽  
Shuang He ◽  
Feifei Wen ◽  
Xiaoyang Xu ◽  
Yangyang Li ◽  
...  

Abstract Background: Colon adenocarcinoma (COAD) is one of the most common malignancies worldwide. Although a large number of studies have elucidated the aetiology of colorectal cancer, the exact mechanism of colorectal cancer development remains to be determined.To identify key modules and prognostic genes that may be involved in the occurrence and development of COAD, weighted gene coexpression network analysis (WGCNA) and differential expression analysis were performed on datasets GSE41657 and GSE74602 from the Gene Expression Omnibus (GEO) database to screen for prognostic differentially expressed genes. Gene expression profiles and clinical information were collected from The Cancer Genome Atlas (TCGA) database for verification.Results: Through WGCNA and DEGs analysis, 439 genes in key functional modules were obtained, and 26 prognostic related genes were finally obtained through prognostic analysis: (1) We screened 5 genes(RPP40, DUSP18, PPRC1, MFSD11 and PDCD11) that have not been studied in COAD.(2)We obtained the most critical module in the occurrence and development of colon cancer and obtained one prognosis-related gene, NUP85, from the most critical module.The relationship between it and tumor immune microenvironment was verified.(3) A prognostic model comprising four coexpressed differential genes was constructed; TIMP1, PMM2, E2F3 and MORC2 were selected as the key prognosis-related genes.Conclusions: (1)As new biomarkers,prognostic genes RPP40, DUSP18, PPRC1, MFSD11 and PDCD11 may be potential therapeutic targets for COAD, and provide new ideas for future research on the mechanism of COAD. (2)NUP85 may be an immune-related gene which was negatively correlated with CD4+ T cell and M2 macrophagesthat plays an important role in inhibiting the occurrence and development of colorectal adenocarcinoma. (3)A Cox proportional risk model based on gene expression can be used to predict the prognosis and survival time of patients with colon cancer.


Sign in / Sign up

Export Citation Format

Share Document