scholarly journals Cell-type-specific analysis of alternative polyadenylation using single-cell transcriptomics data

2019 ◽  
Vol 47 (19) ◽  
pp. 10027-10039 ◽  
Author(s):  
Eldad David Shulman ◽  
Ran Elkon

AbstractAlternative polyadenylation (APA) is emerging as an important layer of gene regulation because the majority of mammalian protein-coding genes contain multiple polyadenylation (pA) sites in their 3′ UTR. By alteration of 3′ UTR length, APA can considerably affect post-transcriptional gene regulation. Yet, our understanding of APA remains rudimentary. Novel single-cell RNA sequencing (scRNA-seq) techniques allow molecular characterization of different cell types to an unprecedented degree. Notably, the most popular scRNA-seq protocols specifically sequence the 3′ end of transcripts. Building on this property, we implemented a method for analysing patterns of APA regulation from such data. Analyzing multiple datasets from diverse tissues, we identified widespread modulation of APA in different cell types resulting in global 3′ UTR shortening/lengthening and enhanced cleavage at intronic pA sites. Our results provide a proof-of-concept demonstration that the huge volume of scRNA-seq data that accumulates in the public domain offers a unique resource for the exploration of APA based on a very broad collection of cell types and biological conditions.

2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Wei Lin ◽  
Pawan Noel ◽  
Erkut H. Borazanci ◽  
Jeeyun Lee ◽  
Albert Amini ◽  
...  

Abstract Background Solid tumors such as pancreatic ductal adenocarcinoma (PDAC) comprise not just tumor cells but also a microenvironment with which the tumor cells constantly interact. Detailed characterization of the cellular composition of the tumor microenvironment is critical to the understanding of the disease and treatment of the patient. Single-cell transcriptomics has been used to study the cellular composition of different solid tumor types including PDAC. However, almost all of those studies used primary tumor tissues. Methods In this study, we employed a single-cell RNA sequencing technology to profile the transcriptomes of individual cells from dissociated primary tumors or metastatic biopsies obtained from patients with PDAC. Unsupervised clustering analysis as well as a new supervised classification algorithm, SuperCT, was used to identify the different cell types within the tumor tissues. The expression signatures of the different cell types were then compared between primary tumors and metastatic biopsies. The expressions of the cell type-specific signature genes were also correlated with patient survival using public datasets. Results Our single-cell RNA sequencing analysis revealed distinct cell types in primary and metastatic PDAC tissues including tumor cells, endothelial cells, cancer-associated fibroblasts (CAFs), and immune cells. The cancer cells showed high inter-patient heterogeneity, whereas the stromal cells were more homogenous across patients. Immune infiltration varies significantly from patient to patient with majority of the immune cells being macrophages and exhausted lymphocytes. We found that the tumor cellular composition was an important factor in defining the PDAC subtypes. Furthermore, the expression levels of cell type-specific markers for EMT+ cancer cells, activated CAFs, and endothelial cells significantly associated with patient survival. Conclusions Taken together, our work identifies significant heterogeneity in cellular compositions of PDAC tumors and between primary tumors and metastatic lesions. Furthermore, the cellular composition was an important factor in defining PDAC subtypes and significantly correlated with patient outcome. These findings provide valuable insights on the PDAC microenvironment and could potentially inform the management of PDAC patients.


2021 ◽  
Author(s):  
Sheng Zhu ◽  
Qiwei Lian ◽  
Wenbin Ye ◽  
Wei Qin ◽  
Zhe Wu ◽  
...  

Abstract Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3′-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.


2020 ◽  
Author(s):  
Jiaxin Fan ◽  
Xuran Wang ◽  
Rui Xiao ◽  
Mingyao Li

AbstractAllelic expression imbalance (AEI), quantified by the relative expression of two alleles of a gene in a diploid organism, can help explain phenotypic variations among individuals. Traditional methods detect AEI using bulk RNA sequencing (RNA-seq) data, a data type that averages out cell-to-cell heterogeneity in gene expression across cell types. Since the patterns of AEI may vary across different cell types, it is desirable to study AEI in a cell-type-specific manner. Although this can be achieved by single-cell RNA sequencing (scRNA-seq), it requires full-length transcript to be sequenced in single cells of a large number of individuals, which are still cost prohibitive to generate. To overcome this limitation and utilize the vast amount of existing disease relevant bulk tissue RNA-seq data, we developed BSCET, which enables the characterization of cell-type-specific AEI in bulk RNA-seq data by integrating cell type composition information inferred from a small set of scRNA-seq samples, possibly obtained from an external dataset. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific AEI are associated with clinical factors. Through extensive benchmark evaluations, we show that BSCET correctly detected genes with cell-type-specific AEI and differential AEI between healthy and diseased samples using bulk RNA-seq data. BSCET also uncovered cell-type-specific AEIs that were missed in bulk data analysis when the directions of AEI are opposite in different cell types. We further applied BSCET to two pancreatic islet bulk RNA-seq datasets, and detected genes showing cell-type-specific AEI that are related to the progression of type 2 diabetes. Since bulk RNA-seq data are easily accessible, BSCET provided a convenient tool to integrate information from scRNA-seq data to gain insight on AEI with cell type resolution. Results from such analysis will advance our understanding of cell type contributions in human diseases.Author SummaryDetection of allelic expression imbalance (AEI), a phenomenon where the two alleles of a gene differ in their expression magnitude, is a key step towards the understanding of phenotypic variations among individuals. Existing methods detect AEI use bulk RNA sequencing (RNA-seq) data and ignore AEI variations among different cell types. Although single-cell RNA sequencing (scRNA-seq) has enabled the characterization of cell-to-cell heterogeneity in gene expression, the high costs have limited its application in AEI analysis. To overcome this limitation, we developed BSCET to characterize cell-type-specific AEI using the widely available bulk RNA-seq data by integrating cell-type composition information inferred from scRNA-seq samples. Since the degree of AEI may vary with disease phenotypes, we further extended BSCET to detect genes whose cell-type-specific AEIs are associated with clinical factors. Through extensive benchmark evaluations and analyses of two pancreatic islet bulk RNA-seq datasets, we demonstrated BSCET’s ability to refine bulk-level AEI to cell-type resolution, and to identify genes whose cell-type-specific AEIs are associated with the progression of type 2 diabetes. With the vast amount of easily accessible bulk RNA-seq data, we believe BSCET will be a valuable tool for elucidating cell type contributions in human diseases.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. e1009080
Author(s):  
Jiaxin Fan ◽  
Xuran Wang ◽  
Rui Xiao ◽  
Mingyao Li

Allelic expression imbalance (AEI), quantified by the relative expression of two alleles of a gene in a diploid organism, can help explain phenotypic variations among individuals. Traditional methods detect AEI using bulk RNA sequencing (RNA-seq) data, a data type that averages out cell-to-cell heterogeneity in gene expression across cell types. Since the patterns of AEI may vary across different cell types, it is desirable to study AEI in a cell-type-specific manner. Although this can be achieved by single-cell RNA sequencing (scRNA-seq), it requires full-length transcript to be sequenced in single cells of a large number of individuals, which are still cost prohibitive to generate. To overcome this limitation and utilize the vast amount of existing disease relevant bulk tissue RNA-seq data, we developed BSCET, which enables the characterization of cell-type-specific AEI in bulk RNA-seq data by integrating cell type composition information inferred from a small set of scRNA-seq samples, possibly obtained from an external dataset. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific AEI are associated with clinical factors. Through extensive benchmark evaluations, we show that BSCET correctly detected genes with cell-type-specific AEI and differential AEI between healthy and diseased samples using bulk RNA-seq data. BSCET also uncovered cell-type-specific AEIs that were missed in bulk data analysis when the directions of AEI are opposite in different cell types. We further applied BSCET to two pancreatic islet bulk RNA-seq datasets, and detected genes showing cell-type-specific AEI that are related to the progression of type 2 diabetes. Since bulk RNA-seq data are easily accessible, BSCET provided a convenient tool to integrate information from scRNA-seq data to gain insight on AEI with cell type resolution. Results from such analysis will advance our understanding of cell type contributions in human diseases.


2021 ◽  
Vol 15 ◽  
Author(s):  
Mingchao Li ◽  
Qing Min ◽  
Matthew C. Banton ◽  
Xinpeng Dun

Advances in single-cell RNA sequencing technologies and bioinformatics methods allow for both the identification of cell types in a complex tissue and the large-scale gene expression profiling of various cell types in a mixture. In this report, we analyzed a single-cell RNA sequencing (scRNA-seq) dataset for the intact adult mouse sciatic nerve and examined cell-type specific transcription factor expression and activity during peripheral nerve homeostasis. In total, we identified 238 transcription factors expressed in nine different cell types of intact mouse sciatic nerve. Vascular smooth muscle cells have the lowest number of transcription factors expressed with 17 transcription factors identified. Myelinating Schwann cells (mSCs) have the highest number of transcription factors expressed, with 61 transcription factors identified. We created a cell-type specific expression map for the identified 238 transcription factors. Our results not only provide valuable information about the expression pattern of transcription factors in different cell types of adult peripheral nerves but also facilitate future studies to understand the function of key transcription factors in the peripheral nerve homeostasis and disease.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1413-D1419 ◽  
Author(s):  
Tianyi Zhao ◽  
Shuxuan Lyu ◽  
Guilin Lu ◽  
Liran Juan ◽  
Xi Zeng ◽  
...  

Abstract SC2disease (http://easybioai.com/sc2disease/) is a manually curated database that aims to provide a comprehensive and accurate resource of gene expression profiles in various cell types for different diseases. With the development of single-cell RNA sequencing (scRNA-seq) technologies, uncovering cellular heterogeneity of different tissues for different diseases has become feasible by profiling transcriptomes across cell types at the cellular level. In particular, comparing gene expression profiles between different cell types and identifying cell-type-specific genes in various diseases offers new possibilities to address biological and medical questions. However, systematic, hierarchical and vast databases of gene expression profiles in human diseases at the cellular level are lacking. Thus, we reviewed the literature prior to March 2020 for studies which used scRNA-seq to study diseases with human samples, and developed the SC2disease database to summarize all the data by different diseases, tissues and cell types. SC2disease documents 946 481 entries, corresponding to 341 cell types, 29 tissues and 25 diseases. Each entry in the SC2disease database contains comparisons of differentially expressed genes between different cell types, tissues and disease-related health status. Furthermore, we reanalyzed gene expression matrix by unified pipeline to improve the comparability between different studies. For each disease, we also compare cell-type-specific genes with the corresponding genes of lead single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS) to implicate cell type specificity of the traits.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Yan Sun ◽  
Qichao Yu ◽  
Lei Li ◽  
Zhanlong Mei ◽  
Biaofeng Zhou ◽  
...  

Abstract Recent studies show that non-coding RNAs (ncRNAs) can regulate the expression of protein-coding genes and play important roles in mammalian development. Previous studies have revealed that during C. elegans (Caenorhabditis elegans) embryo development, numerous genes in each cell are spatiotemporally regulated, causing the cell to differentiate into distinct cell types and tissues. We ask whether ncRNAs participate in the spatiotemporal regulation of genes in different types of cells and tissues during the embryogenesis of C. elegans. Here, by using marker-free full-length high-depth single-cell RNA sequencing (scRNA-seq) technique, we sequence the whole transcriptomes from 1031 embryonic cells of C. elegans and detect 20,431 protein-coding genes, including 22 cell-type-specific protein-coding markers, and 9843 ncRNAs including 11 cell-type-specific ncRNA markers. We induce a ncRNAs-based clustering strategy as a complementary strategy to the protein-coding gene-based clustering strategy for single-cell classification. We identify 94 ncRNAs that have never been reported to regulate gene expressions, are co-expressed with 1208 protein-coding genes in cell type specific and/or embryo time specific manners. Our findings suggest that these ncRNAs could potentially influence the spatiotemporal expression of the corresponding genes during the embryogenesis of C. elegans.


2020 ◽  
Vol 16 (10) ◽  
pp. e1007939
Author(s):  
Amir Alavi ◽  
Ziv Bar-Joseph

Several studies profile similar single cell RNA-Seq (scRNA-Seq) data using different technologies and platforms. A number of alignment methods have been developed to enable the integration and comparison of scRNA-Seq data from such studies. While each performs well on some of the datasets, to date no method was able to both perform the alignment using the original expression space and generalize to new data. To enable such analysis we developed Single Cell Iterative Point set Registration (SCIPR) which extends methods that were successfully applied to align image data to scRNA-Seq. We discuss the required changes needed, the resulting optimization function, and algorithms for learning a transformation function for aligning data. We tested SCIPR on several scRNA-Seq datasets. As we show it successfully aligns data from several different cell types, improving upon prior methods proposed for this task. In addition, we show the parameters learned by SCIPR can be used to align data not used in the training and to identify key cell type-specific genes.


Author(s):  
Maria Brbić ◽  
Marinka Zitnik ◽  
Sheng Wang ◽  
Angela O. Pisco ◽  
Russ B. Altman ◽  
...  

Although tremendous effort has been put into cell type annotation and classification, identification of previously uncharacterized cell types in heterogeneous single-cell RNA-seq data remains a challenge. Here we present MARS, a meta-learning approach for identifying and annotating known as well as novel cell types. MARS overcomes the heterogeneity of cell types by transferring latent cell representations across multiple datasets. MARS uses deep learning to learn a cell embedding function as well as a set of landmarks in the cell embedding space. The method annotates cells by probabilistically defining a cell type based on nearest landmarks in the embedding space. MARS has a unique ability to discover cell types that have never been seen before and annotate experiments that are yet unannotated. We apply MARS to a large aging cell atlas of 23 tissues covering the life span of a mouse. MARS accurately identifies cell types, even when it has never seen them before. Further, the method automatically generates interpretable names for novel cell types. Remarkably, MARS estimates meaningful cell-type-specific signatures of aging and visualizes them as trajectories reflecting temporal relationships of cells in a tissue.


2018 ◽  
Author(s):  
Joshua Welch ◽  
Velina Kozareva ◽  
Ashley Ferreira ◽  
Charles Vanderburg ◽  
Carly Martin ◽  
...  

SummaryDefining cell types requires integrating diverse measurements from multiple experiments and biological contexts. Recent technological developments in single-cell analysis have enabled high-throughput profiling of gene expression, epigenetic regulation, and spatial relationships amongst cells in complex tissues, but computational approaches that deliver a sensitive and specific joint analysis of these datasets are lacking. We developed LIGER, an algorithm that delineates shared and dataset-specific features of cell identity, allowing flexible modeling of highly heterogeneous single-cell datasets. We demonstrated its broad utility by applying it to four diverse and challenging analyses of human and mouse brain cells. First, we defined both cell-type-specific and sexually dimorphic gene expression in the mouse bed nucleus of the stria terminalis, an anatomically complex brain region that plays important roles in sex-specific behaviors. Second, we analyzed gene expression in the substantia nigra of seven postmortem human subjects, comparing cell states in specific donors, and relating cell types to those in the mouse. Third, we jointly leveraged in situ gene expression and scRNA-seq data to spatially locate fine subtypes of cells present in the mouse frontal cortex. Finally, we integrated mouse cortical scRNA-seq profiles with single-cell DNA methylation signatures, revealing mechanisms of cell-type-specific gene regulation. Integrative analyses using the LIGER algorithm promise to accelerate single-cell investigations of cell-type definition, gene regulation, and disease states.


Sign in / Sign up

Export Citation Format

Share Document