scholarly journals ReadZS detects developmentally regulated RNA processing programs in single cell RNA-seq and defines subpopulations independent of gene expression

2021 ◽  
Author(s):  
Elisabeth Meyer ◽  
Roozbeh Dehghannasiri ◽  
Kaitlin Chaung ◽  
Julia Salzman

Post-transcriptional regulation of RNA processing (RNAP), including splicing and alternative polyadenylation (APA), controls eukaryotic gene function. Conservative estimates based on bulk tissue studies conclude that at least 50% of mammalian genes undergo APA. Single-cell RNA sequencing (scRNA-seq) could enable a near complete estimate of the extent, function, and regulation of these and other forms of RNA processing. Yet, statistical methods to detect regulated RNAP are limited in their detection power because they suffer from reliance on (a) incomplete annotations of 3' untranslated regions (3' UTRs), (b) peak calling heuristics, (c) analysis based on measurements collapsed over all cells in a cell type (pseudobulking), or (d) APA-specific detection. Here, we introduce ReadZS, a computationally-efficient, and annotation-free statistical approach to identify regulated RNAP, including but not limited to APA, in single cells. ReadZS rediscovers and substantially extends the scope of known cell type-specific RNAP in the human lung and during human spermatogenesis. The unique single-cell resolution and statistical properties of ReadZS enable discovery of new evolutionarily conserved, developmentally regulated RNAP and subpopulations of lung-resident macrophages, homogenous by gene expression alone.

2021 ◽  
Author(s):  
Julia Eve Olivieri ◽  
Roozbeh Dehghannasiri ◽  
Peter Wang ◽  
SoRi Jang ◽  
Antoine de Morree ◽  
...  

More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach that is agnostic to transcript annotation, to detect cell-type-specific regulated splicing in > 110K carefully annotated single cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type specifically spliced. These results are validated with RNA FISH, single cell PCR, and in high throughput with Smart-seq2. Regulated splicing is found in ubiquitously expressed genes such as actin light chain subunit MYL6 and ribosomal protein RPS24, which has an epithelial-specific microexon. 13% of the statistically most variable splice sites in cell-type specifically regulated genes are also most variable in mouse lemur or mouse. SpliZ analysis further reveals 170 genes with regulated splicing during sperm development using, 10 of which are conserved in mouse and mouse lemur. The statistical properties of the SpliZ allow model-based identification of subpopulations within otherwise indistinguishable cells based on gene expression, illustrated by subpopulations of classical monocytes with stereotyped splicing, including an un-annotated exon, in SAT1, a Diamine acetyltransferase. Together, this unsupervised and annotation-free analysis of differential splicing in ultra high throughput droplet-based sequencing of human cells across multiple organs establishes splicing is regulated cell-type-specifically independent of gene expression.


2021 ◽  
Author(s):  
Zi-Hang Wen ◽  
Jeremy L. Langsam ◽  
Lu Zhang ◽  
Wenjun Shen ◽  
Xin Zhou

AbstractSingle-cell RNA-seq (scRNA-seq) offers opportunities to study gene expression of tens of thousands of single cells simultaneously, to investigate cell-to-cell variation, and to reconstruct cell-type-specific gene regulatory networks. Recovering dropout events in a sparse gene expression matrix for scRNA-seq data is a long-standing matrix completion problem. We introduce Bfimpute, a Bayesian factorization imputation algorithm that reconstructs two latent gene and cell matrices to impute final gene expression matrix within each cell group, with or without the aid of cell type labels or bulk data. Bfimpute achieves better accuracy than other six publicly notable scRNA-seq imputation methods on simulated and real scRNA-seq data, as measured by several different evaluation metrics. Bfimpute can also flexibly integrate any gene or cell related information that users provide to increase the performance. Availability: Bfimpute is implemented in R and is freely available at https://github.com/maiziezhoulab/Bfimpute.


2020 ◽  
Author(s):  
Ying Lei ◽  
Mengnan Cheng ◽  
Zihao Li ◽  
Zhenkun Zhuang ◽  
Liang Wu ◽  
...  

Non-human primates (NHP) provide a unique opportunity to study human neurological diseases, yet detailed characterization of the cell types and transcriptional regulatory features in the NHP brain is lacking. We applied a combinatorial indexing assay, sci-ATAC-seq, as well as single-nuclei RNA-seq, to profile chromatin accessibility in 43,793 single cells and transcriptomics in 11,477 cells, respectively, from prefrontal cortex, primary motor cortex and the primary visual cortex of adult cynomolgus monkey Macaca fascularis. Integrative analysis of these two datasets, resolved regulatory elements and transcription factors that specify cell type distinctions, and discovered area-specific diversity in chromatin accessibility and gene expression within excitatory neurons. We also constructed the dynamic landscape of chromatin accessibility and gene expression of oligodendrocyte maturation to characterize adult remyelination. Furthermore, we identified cell type-specific enrichment of differentially spliced gene isoforms and disease-associated single nucleotide polymorphisms. Our datasets permit integrative exploration of complex regulatory dynamics in macaque brain tissue at single-cell resolution.


2020 ◽  
Author(s):  
Yipeng Gao ◽  
Lei Li ◽  
Christopher I. Amos ◽  
Wei Li

AbstractAlternative polyadenylation (APA) is a major mechanism of post-transcriptional regulation in various cellular processes including cell proliferation and differentiation, but the APA heterogeneity among single cells remains largely unknown. Single-cell RNA sequencing (scRNA-seq) has been extensively used to define cell subpopulations at the transcription level. Yet, most scRNA-seq data have not been analyzed in an “APA-aware” manner. Here, we introduce scDaPars, a bioinformatics algorithm to accurately quantify APA events at both single-cell and single-gene resolution using standard scRNA-seq data. Validations in both real and simulated data indicate that scDaPars can robustly recover missing APA events caused by the low amounts of mRNA sequenced in single cells. When applied to cancer and human endoderm differentiation data, scDaPars not only revealed cell-type-specific APA regulation but also identified cell subpopulations that are otherwise invisible to conventional gene expression analysis. Thus, scDaPars will enable us to understand cellular heterogeneity at the post-transcriptional APA level.


2020 ◽  
Author(s):  
Yasin Uzun ◽  
Hao Wu ◽  
Kai Tan

AbstractDespite rapid advances in single-cell DNA methylation profiling methods, computational tools for data analysis are lagging far behind. A number of tasks, including cell type calling and integration with transcriptome data, requires the construction of a robust gene activity matrix as the prerequisite but challenging task. The advent of multi-omics data enables measurement of both DNA methylation and gene expression for the same single cells. Although such data is rather sparse, they are sufficient to train supervised models that capture the complex relationship between DNA methylation and gene expression and predict gene activities at single-cell level. Here, we present MAPLE (Methylome Association by Predictive Linkage to Expression), a computational framework that learns the association between DNA methylation and expression using both gene- and cell-dependent statistical features. Using multiple datasets generated with different experimental protocols, we show that using predicted gene activity values significantly improves several analysis tasks, including clustering, cell type identification and integration with transcriptome data. With the rapid accumulation of single-cell epigenomics data, MAPLE provides a general framework for integrating such data with transcriptome data.


2020 ◽  
Vol 36 (12) ◽  
pp. 3910-3912 ◽  
Author(s):  
Oscar Franzén ◽  
Johan L M Björkegren

Abstract Summary Single-cell RNA sequencing (scRNA-seq) is a technology to measure gene expression in single cells. It has enabled discovery of new cell types and established cell type atlases of tissues and organs. The widespread adoption of scRNA-seq has created a need for user-friendly software for data analysis. We have developed a web server, alona that incorporates several of the most popular single-cell analysis algorithms into a flexible pipeline. alona can perform quality filtering, normalization, batch correction, clustering, cell type annotation and differential gene expression analysis. Data are visualized in the web browser using an interface based on JavaScript, allowing the user to query genes of interest and visualize the cluster structure. alona accepts a compressed gene expression matrix and identifies cell clusters with a graph-based clustering strategy. Cell types are identified from a comprehensive collection of marker genes or by specifying a custom set of marker genes. Availability and implementation The service runs at https://alona.panglaodb.se and the Python package can be downloaded from https://oscar-franzen.github.io/adobo/. Supplementary information Supplementary data are available at Bioinformatics online.


2001 ◽  
Vol 183 (12) ◽  
pp. 3761-3769 ◽  
Author(s):  
Anja Strauß ◽  
Sonja Michel ◽  
Joachim Morschhäuser

ABSTRACT The opportunistic fungal pathogen Candida albicanscan switch spontaneously and reversibly between different cell forms, a capacity that may enhance adaptation to different host niches and evasion of host defense mechanisms. Phenotypic switching has been studied intensively for the white-opaque switching system of strain WO-1. To facilitate the molecular analysis of phenotypic switching, we have constructed homozygous ura3 mutants from strain WO-1 by targeted gene deletion. The two URA3 alleles were sequentially inactivated using theMPA R -flipping strategy, which is based on the selection of integrative transformants carrying a mycophenolic acid (MPA) resistance marker that is subsequently deleted again by site-specific, FLP-mediated recombination. To investigate a possible cell type-independent switching in the expression of individual phase-specific genes, two different reporter genes that allowed the analysis of gene expression at the single-cell level were integrated into the genome, using URA3 as a selection marker. Fluorescence microscopic analysis of cells in which aGFP reporter gene was placed under the control of phase-specific promoters demonstrated that the opaque-phase-specificSAP1 gene was detectably expressed only in opaque cells and that the white-phase-specific WH11 gene was detectably expressed only in white cells. WhenMPA R was used as a reporter gene, it conferred an MPA-resistant phenotype on opaque but not white cells in strains expressing it from the SAP1 promoter, which was monitored at the level of single cells by a significantly enlarged size of the corresponding colonies on MPA-containing indicator plates. Similarly, white but not opaque cells became MPA resistant whenMPA R was placed under the control of the WH11 promoter. The analysis of these reporter strains showed that cell type-independent phase variation in the expression of the SAP1 and WH11 genes did not occur at a detectable frequency. The expression of these phase-specific genes of C. albicans in vitro, therefore, is tightly linked to the cell type.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Yang Yang ◽  
Anirban Paul ◽  
Thao Nguyen Bach ◽  
Z. Josh Huang ◽  
Michael Q. Zhang

Abstract Background Alternative polyadenylation (APA) is emerging as an important mechanism in the post-transcriptional regulation of gene expression across eukaryotic species. Recent studies have shown that APA plays key roles in biological processes, such as cell proliferation and differentiation. Single-cell RNA-seq technologies are widely used in gene expression heterogeneity studies; however, systematic studies of APA at the single-cell level are still lacking. Results Here, we described a novel computational framework, SAPAS, that utilizes 3′-tag-based scRNA-seq data to identify novel poly(A) sites and quantify APA at the single-cell level. Applying SAPAS to the scRNA-seq data of phenotype characterized GABAergic interneurons, we identified cell type-specific APA events for different GABAergic neuron types. Genes with cell type-specific APA events are enriched for synaptic architecture and communications. In further, we observed a strong enrichment of heritability for several psychiatric disorders and brain traits in altered 3′ UTRs and coding sequences of cell type-specific APA events. Finally, by exploring the modalities of APA, we discovered that the bimodal APA pattern of Pak3 could classify chandelier cells into different subpopulations that are from different laminar positions. Conclusions We established a method to characterize APA at the single-cell level. When applied to a scRNA-seq dataset of GABAergic interneurons, the single-cell APA analysis not only identified cell type-specific APA events but also revealed that the modality of APA could classify cell subpopulations. Thus, SAPAS will expand our understanding of cellular heterogeneity.


2021 ◽  
Vol 12 ◽  
Author(s):  
Sooyoun Oh ◽  
Haesun Park ◽  
Xiuwei Zhang

Advances in single cell transcriptomics have allowed us to study the identity of single cells. This has led to the discovery of new cell types and high resolution tissue maps of them. Technologies that measure multiple modalities of such data add more detail, but they also complicate data integration. We offer an integrated analysis of the spatial location and gene expression profiles of cells to determine their identity. We propose scHybridNMF (single-cell Hybrid Nonnegative Matrix Factorization), which performs cell type identification by combining sparse nonnegative matrix factorization (sparse NMF) with k-means clustering to cluster high-dimensional gene expression and low-dimensional location data. We show that, under multiple scenarios, including the cases where there is a small number of genes profiled and the location data is noisy, scHybridNMF outperforms sparse NMF, k-means, and an existing method that uses a hidden Markov random field to encode cell location and gene expression data for cell type identification.


Sign in / Sign up

Export Citation Format

Share Document