recount-brain: a curated repository of human brain RNA-seq datasets metadata

AbstractThe usability of publicly-available gene expression data is often limited by the availability of high-quality, standardized biological phenotype and experimental condition information (“metadata”). We released the recount2 project, which involved re-processing ∼70,000 samples in the Sequencing Read Archive (SRA), Genotype-Tissue Expression (GTEx), and The Cancer Genome Atlas (TCGA) projects. While samples from the latter two projects are well-characterized with extensive metadata, the ∼50,000 RNA-seq samples from SRA in recount2 are inconsistently annotated with metadata. Tissue type, sex, and library type can be estimated from the RNA sequencing (RNA-seq) data itself. However, more detailed and harder to predict metadata, like age and diagnosis, must ideally be provided by labs that deposit the data.To facilitate more analyses within human brain tissue data, we have complemented phenotype predictions by manually constructing a uniformly-curated database of public RNA-seq samples present in SRA and recount2. We describe the reproducible curation process for constructing recount-brain that involves systematic review of the primary manuscript, which can serve as a guide to annotate other studies and tissues. We further expanded recount-brain by merging it with GTEx and TCGA brain samples as well as linking to controlled vocabulary terms for tissue, Brodmann area and disease. Furthermore, we illustrate how to integrate the sample metadata in recount-brain with the gene expression data in recount2 to perform differential expression analysis. We then provide three analysis examples involving modeling postmortem interval, glioblastoma, and meta-analyses across GTEx and TCGA. Overall, recount-brain facilitates expression analyses and improves their reproducibility as individual researchers do not have to manually curate the sample metadata. recount-brain is available via the add_metadata() function from the recount Bioconductor package at bioconductor.org/packages/recount.

Download Full-text

Explainable autoencoder-based representation learning for gene expression data

10.1101/2021.12.21.473742 ◽

2021 ◽

Author(s):

Yang Yu ◽

Pathum Kossinna ◽

Wenyuan Liao ◽

Qingrun Zhang

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Hidden Variables ◽

Representation Learning ◽

The Cancer Genome Atlas ◽

Expression Data ◽

Rna Seq ◽

Gene Expression Data Analysis ◽

Cancer Genome Atlas ◽

Modern Machine

Modern machine learning methods have been extensively utilized in gene expression data analysis. In particular, autoencoders (AE) have been employed in processing noisy and heterogenous RNA-Seq data. However, AEs usually lead to "black-box" hidden variables difficult to interpret, hindering downstream experimental validation and clinical translation. To bridge the gap between complicated models and biological interpretations, we developed a tool, XAE4Exp (eXplainable AutoEncoder for Expression data), which integrates AE and SHapley Additive exPlanations (SHAP), a flagship technique in the field of eXplainable AI (XAI). It quantitatively evaluates the contributions of each gene to the hidden structure learned by an AE, substantially improving the expandability of AE outcomes. By applying XAE4Exp to The Cancer Genome Atlas (TCGA) breast cancer gene expression data, we identified genes that are not differentially expressed, and pathways in various cancer-related classes. This tool will enable researchers and practitioners to analyze high-dimensional expression data intuitively, paving the way towards broader uses of deep learning.

Download Full-text

Identification of valid reference genes for the normalization of RT qPCR gene expression data in human brain tissue

BMC Molecular Biology ◽

10.1186/1471-2199-9-46 ◽

2008 ◽

Vol 9 (1) ◽

pp. 46 ◽

Cited By ~ 62

Author(s):

David TR Coulson ◽

Simon Brockbank ◽

Joseph G Quinn ◽

Suzanne Murphy ◽

Rivka Ravid ◽

...

Keyword(s):

Gene Expression ◽

Human Brain ◽

Brain Tissue ◽

Gene Expression Data ◽

Reference Genes ◽

Expression Data ◽

Human Brain Tissue

Download Full-text

Mining The Cancer Genome Atlas gene expression data for lineage markers in distinguishing bladder urothelial carcinoma and prostate adenocarcinoma

Scientific Reports ◽

10.1038/s41598-021-85993-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ewe Seng Ch’ng

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

The Cancer Genome Atlas ◽

Relative Importance ◽

Expression Data ◽

Gene Expressions ◽

Urothelial Carcinomas ◽

Cancer Genome Atlas ◽

Lineage Markers ◽

Genome Atlas

AbstractDistinguishing bladder urothelial carcinomas from prostate adenocarcinomas for poorly differentiated carcinomas derived from the bladder neck entails the use of a panel of lineage markers to help make this distinction. Publicly available The Cancer Genome Atlas (TCGA) gene expression data provides an avenue to examine utilities of these markers. This study aimed to verify expressions of urothelial and prostate lineage markers in the respective carcinomas and to seek the relative importance of these markers in making this distinction. Gene expressions of these markers were downloaded from TCGA Pan-Cancer database for bladder and prostate carcinomas. Differential gene expressions of these markers were analyzed. Standard linear discriminant analyses were applied to establish the relative importance of these markers in lineage determination and to construct the model best in making the distinction. This study shows that all urothelial lineage genes except for the gene for uroplakin III were significantly expressed in bladder urothelial carcinomas (p < 0.001). In descending order of importance to distinguish from prostate adenocarcinomas, genes for uroplakin II, S100P, GATA3 and thrombomodulin had high discriminant loadings (> 0.3). All prostate lineage genes were significantly expressed in prostate adenocarcinomas(p < 0.001). In descending order of importance to distinguish from bladder urothelial carcinomas, genes for NKX3.1, prostate specific antigen (PSA), prostate-specific acid phosphatase, prostein, and prostate-specific membrane antigen had high discriminant loadings (> 0.3). Combination of gene expressions for uroplakin II, S100P, NKX3.1 and PSA approached 100% accuracy in tumor classification both in the training and validation sets. Mining gene expression data, a combination of four lineage markers helps distinguish between bladder urothelial carcinomas and prostate adenocarcinomas.

Download Full-text

Comparing RNA-Seq and microarray gene expression data in two zones of the Arabidopsis root apex relevant to spaceflight

Applications in Plant Sciences ◽

10.1002/aps3.1197 ◽

2018 ◽

Vol 6 (11) ◽

pp. e01197 ◽

Cited By ~ 3

Author(s):

Aparna Krishnamurthy ◽

Robert J. Ferl ◽

Anna-Lisa Paul

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Root Apex ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Microarray Gene Expression ◽

Arabidopsis Root ◽

Microarray Gene

Download Full-text

Characterization of the nuclear and cytosolic transcriptomes in human brain tissue reveals new insights into the subcellular distribution of RNA transcripts

10.1101/2020.04.08.031419 ◽

2020 ◽

Author(s):

Ammar Zaghlool ◽

Adnan Niazi ◽

Åsa K. Björklund ◽

Jakub Orzechowski Westholm ◽

Adam Ameur ◽

...

Keyword(s):

Gene Expression ◽

Subcellular Localization ◽

Human Brain ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Adult Brain ◽

Sequencing Data ◽

Human Brain Tissue ◽

Protein Coding ◽

The Impact

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we performed a comprehensive analysis of cytosolic and nuclear transcriptomes in human fetal and adult brain samples. We show significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. Transcripts displaying differential subcellular localization belong to particular functional categories and display tissue-specific localization patterns. We also show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Further investigation of the use of the cytosolic or the nuclear transcriptome for differential gene expression analysis indicates important differences in results depending on the cellular compartment. These differences were manifested at the level of transcript types and the number of differentially expressed genes. Our data provide a resource of RNA subcellular localization in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for differential expression analysis.

Download Full-text

IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis

PLoS Computational Biology ◽

10.1371/journal.pcbi.1006792 ◽

2019 ◽

Vol 15 (2) ◽

pp. e1006792 ◽

Cited By ~ 11

Author(s):

Brandon Monier ◽

Adam McDermaid ◽

Cankun Wang ◽

Jing Zhao ◽

Allison Miller ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Gene Expression Data Analysis ◽

Interpretation System

Download Full-text

Abstract P3-04-10: Comparison between RNA-Seq and Affymetrix gene expression data

10.1158/0008-5472.sabcs12-p3-04-10 ◽

2012 ◽

Cited By ~ 1

Author(s):

D Fumagalli ◽

B Haibe-Kains ◽

S Michiels ◽

DN Brown ◽

D Gacquer ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Affymetrix Gene Expression

Download Full-text

GEDS: A Gene Expression Display Server for mRNAs, miRNAs and Proteins

Cells ◽

10.3390/cells8070675 ◽

2019 ◽

Vol 8 (7) ◽

pp. 675 ◽

Cited By ~ 5

Author(s):

Xia ◽

Liu ◽

Zhang ◽

Guo

Keyword(s):

Gene Expression ◽

Cell Lines ◽

Gene Expression Data ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cancer Cell Line ◽

Tissue Expression ◽

The Cancer Genome Atlas ◽

Expression Data ◽

Protein Levels

High-throughput technologies generate a tremendous amount of expression data on mRNA, miRNA and protein levels. Mining and visualizing the large amount of expression data requires sophisticated computational skills. An easy to use and user-friendly web-server for the visualization of gene expression profiles could greatly facilitate data exploration and hypothesis generation for biologists. Here, we curated and normalized the gene expression data on mRNA, miRNA and protein levels in 23315, 9009 and 9244 samples, respectively, from 40 tissues (The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GETx)) and 1594 cell lines (Cancer Cell Line Encyclopedia (CCLE) and MD Anderson Cell Lines Project (MCLP)). Then, we constructed the Gene Expression Display Server (GEDS), a web-based tool for quantification, comparison and visualization of gene expression data. GEDS integrates multiscale expression data and provides multiple types of figures and tables to satisfy several kinds of user requirements. The comprehensive expression profiles plotted in the one-stop GEDS platform greatly facilitate experimental biologists utilizing big data for better experimental design and analysis. GEDS is freely available on http://bioinfo.life.hust.edu.cn/web/GEDS/.

Download Full-text