Broad transcriptomic dysregulation across the cerebral cortex in ASD

AbstractClassically, psychiatric disorders have been considered to lack defining pathology, but recent work has demonstrated consistent disruption at the molecular level, characterized by transcriptomic and epigenetic alterations.1–3 In ASD, upregulation of microglial, astrocyte, and immune signaling genes, downregulation of specific synaptic genes, and attenuation of regional gene expression differences are observed.1,2,4–6 However, whether these changes are limited to the cortical association areas profiled is unknown. Here, we perform RNA-sequencing (RNA-seq) on 725 brain samples spanning 11 distinct cortical areas in 112 ASD cases and neurotypical controls. We identify substantially more genes and isoforms that differentiate ASD from controls than previously observed. These alterations are pervasive and cortex-wide, but vary in magnitude across regions, roughly showing an anterior to posterior gradient, with the strongest signal in visual cortex, followed by parietal cortex and the temporal lobe. We find a notable enrichment of ASD genetic risk variants among cortex-wide downregulated synaptic plasticity genes and upregulated protein folding gene isoforms. Finally, using snRNA-seq, we determine that regional variation in the magnitude of transcriptomic dysregulation reflects changes in cellular proportion and cell-type-specific gene expression, particularly impacting L3/4 excitatory neurons. These results highlight widespread, genetically-driven neuronal dysfunction as a major component of ASD pathology in the cerebral cortex, extending beyond association cortices to involve primary sensory regions.

Download Full-text

Genetic influences on cell type specific gene expression and splicing during neurogenesis elucidate regulatory mechanisms of brain traits

10.1101/2020.10.21.349019 ◽

2020 ◽

Author(s):

Nil Aygün ◽

Angela L. Elwell ◽

Dan Liang ◽

Michael J. Lafferty ◽

Kerry E. Cheek ◽

...

Keyword(s):

Gene Expression ◽

Regulatory Elements ◽

Regulatory Mechanisms ◽

Specific Gene ◽

Cell Type ◽

Specific Expression ◽

Risk Variants ◽

Allele Specific ◽

Cell Type Specific ◽

Regulatory Effects

SummaryInterpretation of the function of non-coding risk loci for neuropsychiatric disorders and brain-relevant traits via gene expression and alternative splicing is mainly performed in bulk post-mortem adult tissue. However, genetic risk loci are enriched in regulatory elements of cells present during neocortical differentiation, and regulatory effects of risk variants may be masked by heterogeneity in bulk tissue. Here, we map e/sQTLs and allele specific expression in primary human neural progenitors (n=85) and their sorted neuronal progeny (n=74). Using colocalization and TWAS, we uncover cell-type specific regulatory mechanisms underlying risk for these traits.

Download Full-text

Cell type-specific gene expression profiling in brain tissue: comparison between TRAP, LCM and RNA-seq

BMB Reports ◽

10.5483/bmbrep.2015.48.7.218 ◽

2015 ◽

Vol 48 (7) ◽

pp. 388-394 ◽

Cited By ~ 12

Author(s):

TaeHyun Kim ◽

Chae-Seok Lim ◽

Bong-Kiun Kaang

Keyword(s):

Gene Expression ◽

Gene Expression Profiling ◽

Brain Tissue ◽

Expression Profiling ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Specific Gene Expression ◽

Cell Type Specific

Download Full-text

Improved estimation of cell type-specific gene expression through deconvolution of bulk tissues with matrix completion

10.1101/2021.06.30.450493 ◽

2021 ◽

Author(s):

Weixu Wang ◽

Jun Yao ◽

Yi Wang ◽

Chao Zhang ◽

Wei Tao ◽

...

Keyword(s):

Gene Expression ◽

Single Cells ◽

Matrix Completion ◽

Superior Performance ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Specific Gene Expression ◽

Large Patient ◽

Cell Type Specific

Cell type-specific gene expression (CSE) brings novel biological insights into both physiological and pathological processes compared with bulk tissue gene expression. Although fluorescence-activated cell sorting (FACS) and single-cell RNA sequencing (scRNA-seq) are two widely used techniques to detect gene expression in a cell type-specific manner, the constraints of cost and labor force make it impractical as a routine on large patient cohorts. Here, we present ENIGMA, an algorithm that deconvolutes bulk RNA-seq into cell type-specific expression matrices and cell type fraction matrices without the need of physical sorting or sequencing of single cells. ENIGMA used cell type signature matrix generated from either FACS RNA-seq or scRNA-seq as reference, and applied matrix completion technique to achieve fast and accurate deconvolution. We demonstrated the superior performance of ENIGMA to previously published algorithms (TCA, bMIND and CIBERSORTx) while requiring much less running time on both simulated and realistic datasets. To prove its value in biological discovery, we applied ENIGMA to bulk RNA-seq from arthritis patients and revealed a pseudo-differentiation trajectory that could reflect monocyte to macrophage transition. We also applied ENIGMA to bulk RNA-seq data of pancreatic islet tissue from type 2 diabetes (T2D) patients and discovered a beta cell-specific gene co-expression module related to senescence and apoptosis that possibly contributed to the pathogenesis of T2D. Together, ENIGMA provides a new framework to improve the CSE estimation by integrating FACS RNA-seq and scRNA-seq with tissue bulk RNA-seq data, and will extend our understandings about cell heterogeneity on population level with no need for experimental tissue disaggregation.

Download Full-text

Chromatin-enriched RNAs mark active and repressive cis-regulation: an analysis of nuclear RNA-seq

10.1101/646950 ◽

2019 ◽

Author(s):

Xiangying Sun ◽

Zhezhen Wang ◽

Carlos Perez-Cervantes ◽

Alex Ruthenburg ◽

Ivan Moskowitz ◽

...

Keyword(s):

Gene Expression ◽

Noncoding Rna ◽

Molecular Mechanisms ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Neighboring Gene ◽

Cis Regulation ◽

Nuclear Rna ◽

Cell Type Specific

AbstractLong noncoding RNAs (lncRNAs) localize in the cell nucleus and influence gene expression through a variety of molecular mechanisms. RNA sequencing of two biochemical fractions of nuclei reveals a unique class of lncRNAs, termed chromatin-enriched nuclear RNAs (cheRNAs) that are tightly bound to chromatin and putatively function to cis-activate gene expression. Until now, a rigorous analytic pipeline for nuclear RNA-seq has been lacking. In this study, we survey four computational strategies for nuclear RNA-seq data analysis and show that a new pipeline, Tuxedo, outperforms other approaches. Tuxedo not only assembles a more complete transcriptome, but also identifies cheRNA with higher accuracy. We have used Tuxedo to analyze gold-standard K562 cell datasets and further characterize the genomic features of intergenic cheRNA (icheRNA) and their similarity to those of enhancer RNA (eRNA). Moreover, we quantify the transcriptional correlation of icheRNA and adjacent genes, and suggest that icheRNA may be the cis-acting transcriptional regulator that is more positively associated with neighboring gene expression than eRNA predicted by state-of-art method or CAGE signal. We also explore two novel genomic associations, suggesting cheRNA may have diverse functions. A possible new role of H3K9me3 modification coincident with icheRNA may be associated with active enhancer derived from ancient mobile elements, while a potential cis-repressive function of antisense cheRNA (as-cheRNA) is likely to be involved in transiently modulating cell type-specific cis-regulation.Author SummaryChromatin-enriched nuclear RNA (cheRNA) is a class of gene regulatory non-coding RNAs. CheRNA provides a powerful way to profile the nuclear transcriptional landscape, especially to profile the noncoding transcriptome. The computational framework presented here provides a reliable approach to identifying cheRNA, and for studying cell-type specific gene regulation. We found that intergenic cheRNA, including intergenic cheRNA with high levels of H3K9me3 (a mark associated with closed/repressed chromatin), may act as a transcriptional activator. In contrast, antisense cheRNA, which originates from the complementary strand of the protein-coding gene, may interact with diverse chromatin modulators to repress local transcription. With our new pipeline, one future challenge will be refining the functional mechanisms of these noncoding RNA classes through exploring their regulatory roles, which are involved in diverse molecular and cellular processes in human and other organisms.

Download Full-text

A computational method for direct imputation of cell type-specific expression profiles and cellular compositions from bulk-tissue RNA-Seq in brain disorders

10.1101/2020.05.28.121483 ◽

2020 ◽

Author(s):

Abolfazl Doostparast Torshizi ◽

Jubao Duan ◽

Kai Wang

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Complex Diseases ◽

Specific Gene ◽

Cellular Composition ◽

Rna Seq ◽

Cell Type ◽

Specific Expression ◽

Cell Type Specific Expression ◽

Cell Type Specific

AbstractThe importance of cell type-specific gene expression in disease-relevant tissues is increasingly recognized in genetic studies of complex diseases. However, the vast majority of gene expression studies are conducted on bulk tissues, necessitating computational approaches to infer biological insights on cell type-specific contribution to diseases. Several computational methods are available for cell type deconvolution (that is, inference of cellular composition) from bulk RNA-Seq data, but cannot impute cell type-specific expression profiles. We hypothesize that with external prior information such as single cell RNA-seq (scRNA-seq) and population-wide expression profiles, it can be a computationally tractable and identifiable to estimate both cellular composition and cell type-specific expression from bulk RNA-Seq data. Here we introduce CellR, which addresses cross-individual gene expression variations by employing genome-wide tissue-wise expression signatures from GTEx to adjust the weights of cell-specific gene markers. It then transforms the deconvolution problem into a linear programming model while taking into account inter/intra cellular correlations, and uses a multi-variate stochastic search algorithm to estimate the expression level of each gene in each cell type. Extensive analyses on several complex diseases such as schizophrenia, Alzheimer’s disease, Huntington’s disease, and type 2 diabetes validated efficiency of CellR, while revealing how specific cell types contribute to different diseases. We conducted numerical simulations on human cerebellum to generate pseudo-bulk RNA-seq data and demonstrated its efficiency in inferring cell-specific expression profiles. Moreover, we inferred cell-specific expression levels from bulk RNA-seq data on schizophrenia and computed differentially expressed genes within certain cell types. Using predicted gene expression profile on excitatory neurons, we were able to reproduce our recently published findings on TCF4 being a master regulator in schizophrenia and showed how this gene and its targets are enriched in excitatory neurons. In summary, CellR compares favorably (both accuracy and stability of inference) against competing approaches on inferring cellular composition from bulk RNA-seq data, but also allows direct imputation of cell type-specific gene expression, opening new doors to re-analyze gene expression data on bulk tissues in complex diseases.

Download Full-text

CDSeqR: fast complete deconvolution for gene expression data from bulk tissues

BMC Bioinformatics ◽

10.1186/s12859-021-04186-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kai Kang ◽

Caizhi Huang ◽

Yuanyuan Li ◽

David M. Umbach ◽

Leping Li

Keyword(s):

Gene Expression ◽

Cell Types ◽

Biological Tissues ◽

Specific Gene ◽

Specific Cell ◽

Specific Information ◽

Expression Data ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific

Abstract Background Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. Result We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Conclusions The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.

Download Full-text

Sources of Variation in Cell-Type RNA-Seq Profiles

10.21203/rs.2.23415/v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Johan Gustafsson ◽

Felix Held ◽

Jonathan Robinson ◽

Elias Björnson ◽

Rebecka Jörnsten ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Specific Gene Expression ◽

Cell Type Specific ◽

Technical Factors

Abstract Background Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. Results We evaluated different normalization methods, quantified the magnitude of variation introduced by different sources, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We applied methods such as random forest regression to a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is of the same magnitude as the biological variation across cell types. Tissue of origin and cell subtype are less important but still substantial factors, while the difference between individuals is relatively small. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample.Conclusions Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.

Download Full-text

An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data

Cells ◽

10.3390/cells8101161 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1161 ◽

Cited By ~ 2

Author(s):

Xifang Sun ◽

Shiquan Sun ◽

Sheng Yang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Disease Etiology ◽

Expression Levels ◽

Cell Type Specific ◽

Gene Expression Levels

Estimating cell type compositions for complex diseases is an important step to investigate the cellular heterogeneity for understanding disease etiology and potentially facilitate early disease diagnosis and prevention. Here, we developed a computationally statistical method, referring to Multi-Omics Matrix Factorization (MOMF), to estimate the cell-type compositions of bulk RNA sequencing (RNA-seq) data by leveraging cell type-specific gene expression levels from single-cell RNA sequencing (scRNA-seq) data. MOMF not only directly models the count nature of gene expression data, but also effectively accounts for the uncertainty of cell type-specific mean gene expression levels. We demonstrate the benefits of MOMF through three real data applications, i.e., Glioblastomas (GBM), colorectal cancer (CRC) and type II diabetes (T2D) studies. MOMF is able to accurately estimate disease-related cell type proportions, i.e., oligodendrocyte progenitor cells and macrophage cells, which are strongly associated with the survival of GBM and CRC, respectively.

Download Full-text

Sources of Variation in Cell-Type RNA-Seq Profiles

10.21203/rs.2.23415/v2 ◽

2020 ◽

Author(s):

Johan Gustafsson ◽

Felix Held ◽

Jonathan Robinson ◽

Elias Björnson ◽

Rebecka Jörnsten ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Specific Gene Expression ◽

Cell Type Specific ◽

Technical Factors

Abstract Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. We evaluated different normalization methods, quantified the variance explained by different factors, evaluated the effect on deconvolution of cell type fractions, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We investigated a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is substantial, even for genes specifically selected for deconvolution, and has a confounding effect on deconvolution. Tissue of origin is also a substantial factor, highlighting the challenge of applying cell type profiles derived from blood on mixtures from other tissues. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample. Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.

Download Full-text

A novel computational complete deconvolution method using RNA-seq data

10.1101/496596 ◽

2018 ◽

Cited By ~ 2

Author(s):

Kai Kang ◽

Qian Meng ◽

Igor Shats ◽

David M. Umbach ◽

Melissa Li ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Specific Gene Expression ◽

Cell Type Composition ◽

Type Composition ◽

Cell Type Specific

AbstractThe cell type composition of many biological tissues varies widely across samples. Such sample heterogeneity hampers efforts to probe the role of each cell type in the tissue microenvironment. Current approaches that address this issue have drawbacks. Cell sorting or single-cell based experimental techniques disrupt in situ interactions and alter physiological status of cells in tissues. Computational methods are flexible and promising; but they often estimate either sample-specific proportions of each cell type or cell-type-specific gene expression profiles, not both, by requiring the other as input. We introduce a computational Complete Deconvolution method that can estimate both sample-specific proportions of each cell type and cell-type-specific gene expression profiles simultaneously using bulk RNA-Seq data only (CDSeq). We assessed our method’s performance using several synthetic and experimental mixtures of varied but known cell type composition and compared its performance to the performance of two state-of-the art deconvolution methods on the same mixtures. The results showed CDSeq can estimate both sample-specific proportions of each component cell type and cell-typespecificgene expression profiles with high accuracy. CDSeq holds promise for computationally deciphering complex mixtures of cell types, each with differing expression profiles, using RNA-seq data measured in bulk tissue (MATLAB code is available at https://github.com/kkang7/CDSeq_011).

Download Full-text