CSNet: Estimating cell-type-specific gene co-expression networks from bulk gene expression data

Inferring and characterizing gene co-expression networks have led to important insights on the molecular mechanisms and functional pathways in healthy and diseased individuals. Most co-expression analyses to date have been performed on gene expression data collected from bulk tissues with different cell type compositions across samples, resulting in co-expression estimates confounded by heterogeneity in cell type proportions. To address this limitation in co-expression analysis, we propose a flexible framework that estimates cell-type-specific gene co-expressions from bulk sample data, where the cell-type-specific distributions of gene expression levels are not assumed known. To overcome the computational challenge in estimating covariances and correlations from a convolution of high dimensional densities, we develop a novel thresholded least squares estimator, named CSNet, that is efficient to implement and has good theoretical properties. We further investigate the convergence rate of CSNet. The utility and efficacy of CSNet is demonstrated through simulation studies and an application to a gene co-expression study with bulk samples from Alzheimer's disease patients, where our analysis identified new cell-type-specific modules of AD risk genes.

Download Full-text

Using multiple measurements of tissue to estimate subject- and cell-type-specific gene expression

Bioinformatics ◽

10.1093/bioinformatics/btz619 ◽

2019 ◽

Vol 36 (3) ◽

pp. 782-788 ◽

Cited By ~ 6

Author(s):

Jiebiao Wang ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Empirical Bayes ◽

Brain Regions ◽

Tissue Level ◽

Supplementary Information ◽

Specific Gene ◽

Expression Data ◽

Cell Type ◽

Multiple Measurements ◽

Cell Type Specific

Abstract Motivation Patterns of gene expression, quantified at the level of tissue or cells, can inform on etiology of disease. There are now rich resources for tissue-level (bulk) gene expression data, which have been collected from thousands of subjects, and resources involving single-cell RNA-sequencing (scRNA-seq) data are expanding rapidly. The latter yields cell type information, although the data can be noisy and typically are derived from a small number of subjects. Results Complementing these approaches, we develop a method to estimate subject- and cell-type-specific (CTS) gene expression from tissue using an empirical Bayes method that borrows information across multiple measurements of the same tissue per subject (e.g. multiple regions of the brain). Analyzing expression data from multiple brain regions from the Genotype-Tissue Expression project (GTEx) reveals CTS expression, which then permits downstream analyses, such as identification of CTS expression Quantitative Trait Loci (eQTL). Availability and implementation We implement this method as an R package MIND, hosted on https://github.com/randel/MIND. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Chromatin-enriched RNAs mark active and repressive cis-regulation: an analysis of nuclear RNA-seq

10.1101/646950 ◽

2019 ◽

Author(s):

Xiangying Sun ◽

Zhezhen Wang ◽

Carlos Perez-Cervantes ◽

Alex Ruthenburg ◽

Ivan Moskowitz ◽

...

Keyword(s):

Gene Expression ◽

Noncoding Rna ◽

Molecular Mechanisms ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Neighboring Gene ◽

Cis Regulation ◽

Nuclear Rna ◽

Cell Type Specific

AbstractLong noncoding RNAs (lncRNAs) localize in the cell nucleus and influence gene expression through a variety of molecular mechanisms. RNA sequencing of two biochemical fractions of nuclei reveals a unique class of lncRNAs, termed chromatin-enriched nuclear RNAs (cheRNAs) that are tightly bound to chromatin and putatively function to cis-activate gene expression. Until now, a rigorous analytic pipeline for nuclear RNA-seq has been lacking. In this study, we survey four computational strategies for nuclear RNA-seq data analysis and show that a new pipeline, Tuxedo, outperforms other approaches. Tuxedo not only assembles a more complete transcriptome, but also identifies cheRNA with higher accuracy. We have used Tuxedo to analyze gold-standard K562 cell datasets and further characterize the genomic features of intergenic cheRNA (icheRNA) and their similarity to those of enhancer RNA (eRNA). Moreover, we quantify the transcriptional correlation of icheRNA and adjacent genes, and suggest that icheRNA may be the cis-acting transcriptional regulator that is more positively associated with neighboring gene expression than eRNA predicted by state-of-art method or CAGE signal. We also explore two novel genomic associations, suggesting cheRNA may have diverse functions. A possible new role of H3K9me3 modification coincident with icheRNA may be associated with active enhancer derived from ancient mobile elements, while a potential cis-repressive function of antisense cheRNA (as-cheRNA) is likely to be involved in transiently modulating cell type-specific cis-regulation.Author SummaryChromatin-enriched nuclear RNA (cheRNA) is a class of gene regulatory non-coding RNAs. CheRNA provides a powerful way to profile the nuclear transcriptional landscape, especially to profile the noncoding transcriptome. The computational framework presented here provides a reliable approach to identifying cheRNA, and for studying cell-type specific gene regulation. We found that intergenic cheRNA, including intergenic cheRNA with high levels of H3K9me3 (a mark associated with closed/repressed chromatin), may act as a transcriptional activator. In contrast, antisense cheRNA, which originates from the complementary strand of the protein-coding gene, may interact with diverse chromatin modulators to repress local transcription. With our new pipeline, one future challenge will be refining the functional mechanisms of these noncoding RNA classes through exploring their regulatory roles, which are involved in diverse molecular and cellular processes in human and other organisms.

Download Full-text

CDSeqR: fast complete deconvolution for gene expression data from bulk tissues

BMC Bioinformatics ◽

10.1186/s12859-021-04186-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kai Kang ◽

Caizhi Huang ◽

Yuanyuan Li ◽

David M. Umbach ◽

Leping Li

Keyword(s):

Gene Expression ◽

Cell Types ◽

Biological Tissues ◽

Specific Gene ◽

Specific Cell ◽

Specific Information ◽

Expression Data ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific

Abstract Background Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. Result We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Conclusions The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.

Download Full-text

Immuno-Navigator, a batch-corrected coexpression database, reveals cell type-specific gene networks in the immune system

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1604351113 ◽

2016 ◽

Vol 113 (17) ◽

pp. E2393-E2402 ◽

Cited By ~ 28

Author(s):

Alexis Vandenbon ◽

Viet H. Dinh ◽

Norihisa Mikami ◽

Yohko Kitagawa ◽

Shunsuke Teraguchi ◽

...

Keyword(s):

Gene Expression ◽

Immune System ◽

Gene Expression Data ◽

Cell Types ◽

Treg Cells ◽

Integrated Analysis ◽

Expression Data ◽

Batch Effects ◽

Cell Type ◽

Cell Type Specific

High-throughput gene expression data are one of the primary resources for exploring complex intracellular dynamics in modern biology. The integration of large amounts of public data may allow us to examine general dynamical relationships between regulators and target genes. However, obstacles for such analyses are study-specific biases or batch effects in the original data. Here we present Immuno-Navigator, a batch-corrected gene expression and coexpression database for 24 cell types of the mouse immune system. We systematically removed batch effects from the underlying gene expression data and showed that this removal considerably improved the consistency between inferred correlations and prior knowledge. The data revealed widespread cell type-specific correlation of expression. Integrated analysis tools allow users to use this correlation of expression for the generation of hypotheses about biological networks and candidate regulators in specific cell types. We show several applications of Immuno-Navigator as examples. In one application we successfully predicted known regulators of importance in naturally occurring Treg cells from their expression correlation with a set of Treg-specific genes. For one high-scoring gene, integrin β8 (Itgb8), we confirmed an association between Itgb8 expression in forkhead box P3 (Foxp3)-positive T cells and Treg-specific epigenetic remodeling. Our results also suggest that the regulation of Treg-specific genes within Treg cells is relatively independent of Foxp3 expression, supporting recent results pointing to a Foxp3-independent component in the development of Treg cells.

Download Full-text

The Interplay Between Chromatin Architecture and Lineage-Specific Transcription Factors and the Regulation of Rag Gene Expression

Frontiers in Immunology ◽

10.3389/fimmu.2021.659761 ◽

2021 ◽

Vol 12 ◽

Author(s):

Kazuko Miyazaki ◽

Masaki Miyazaki

Keyword(s):

Gene Expression ◽

Transcription Factors ◽

Cell Fate ◽

Molecular Mechanisms ◽

Current Knowledge ◽

Specific Gene ◽

Cell Type ◽

Cell Fate Decisions ◽

Chromatin Architecture ◽

Cell Type Specific

Cell type-specific gene expression is driven through the interplay between lineage-specific transcription factors (TFs) and the chromatin architecture, such as topologically associating domains (TADs), and enhancer-promoter interactions. To elucidate the molecular mechanisms of the cell fate decisions and cell type-specific functions, it is important to understand the interplay between chromatin architectures and TFs. Among enhancers, super-enhancers (SEs) play key roles in establishing cell identity. Adaptive immunity depends on the RAG-mediated assembly of antigen recognition receptors. Hence, regulation of the Rag1 and Rag2 (Rag1/2) genes is a hallmark of adaptive lymphoid lineage commitment. Here, we review the current knowledge of 3D genome organization, SE formation, and Rag1/2 gene regulation during B cell and T cell differentiation.

Download Full-text

Using multiple measurements of tissue to estimate subject- and cell-type-specific gene expression

10.1101/379099 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jiebiao Wang ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Empirical Bayes ◽

Tissue Expression ◽

Brain Regions ◽

Tissue Level ◽

Specific Gene ◽

Expression Data ◽

Cell Type ◽

Multiple Measurements ◽

Cell Type Specific

AbstractMotivationPatterns of gene expression, quantified at the level of tissue or cells, can inform on etiology of disease. There are now rich resources for tissue-level (bulk) gene expression data, which have been collected from thousands of subjects, and resources involving single-cell RNA-sequencing (scRNA-seq) data are expanding rapidly. The latter yields cell type information, although the data can be noisy and typically are derived from a small number of subjects.ResultsComplementing these approaches, we develop a method to estimate subject- and cell-type-specific (CTS) gene expression from tissue using an empirical Bayes method that borrows information across multiple measurements of the same tissue per subject (e.g., multiple regions of the brain). Analyzing expression data from multiple brain regions from the Genotype-Tissue Expression project (GTEx) reveals CTS expression, which then permits downstream analyses, such as identification of CTS expression Quantitative Trait Loci (eQTL).Availability and implementationWe implement this method as an R package MIND, hosted on https://github.com/randel/MIND.

Download Full-text

ICeD-T Provides Accurate Estimates of Immune Cell Abundance in Tumor Samples by Allowing for Aberrant Gene Expression Patterns

10.1101/326421 ◽

2018 ◽

Author(s):

Douglas R. Wilson ◽

Joseph G. Ibrahim ◽

Wei Sun

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Immune Cell ◽

Expression Patterns ◽

Real Data ◽

Superior Performance ◽

Specific Gene ◽

Expression Data ◽

Cell Type ◽

Cell Composition

AbstractImmunotherapies have achieved phenomenal success in the treatment of cancer and promise even more breakthroughs in the near future. The need to understand the underlying mechanisms of immunotherapies and to develop precision immunotherapy regimens has spurred great interest in characterizing immune cell composition within the tumor microenvironment. Several methods have been developed to estimate immune cell composition using gene expression data from bulk tumor samples. However, these methods are not flexible enough to handle aberrant patterns of gene expression data, e.g., inconsistent cell type-specific gene expression between purified reference samples and this cell type in tumor samples. In this paper, we present a novel statistical model for expression deconvolution called ICeD-T (Immune Cell Deconvolution in Tumor tissues), which models gene expression by a log-normal distribution that is appropriate for both microarray and RNA-seq data. ICeD-T automatically identifies aberrant genes whose expressions are inconsistent with the deconvolution model and down-weights their contributions to cell type abundance estimates. We evaluated the performance of ICeD-T versus existing methods in simulation studies and several real data analyses. ICeD-T displayed comparable or superior performance to these competing methods. Applying these methods to assess the relationship between immunotherapy response and immune cell composition, ICeD-T is able to identify significant associations that are missed by its competitors.

Download Full-text