scholarly journals Robust and annotation-free analysis of isoform variation using short-read scRNA-seq data

2021 ◽  
Author(s):  
Gonzalo Benegas ◽  
Jonathan Fischer ◽  
Yun S. Song

AbstractAlthough isoform diversity is acknowledged as a fundamental and pervasive aspect of gene expression in higher eukaryotes, it is often omitted from single-cell studies due to quantification challenges inherent to commonly used short-read sequencing technologies. To address this issue, we have developed a suite of computational tools to investigate isoform variation by focusing on splice junction usage patterns, which can often be well characterized in spite of technical difficulties. Our method, which we name scQuint (single-cell quantification of introns), can perform accurate quantification, dimensionality reduction, and differential splicing analysis using short-read, full-length single-cell RNA-seq data. Notably, scQuint does not require transcriptome annotations and is robust to technical artifacts. In applications across diverse mouse tissues from Tabula Muris and the primary motor cortex from the BRAIN Initiative Cell Census Network, we find evidence of strong cell-type-specific isoform variation, complementary to total gene expression, and also identify a large volume of previously unannotated splice junctions. As a community resource, we provide ways to interactively visualize and explore these results, accessible at https://github.com/songlab-cal/scquint-analysis/.

2018 ◽  
Author(s):  
Kedar Nath Natarajan ◽  
Zhichao Miao ◽  
Miaomiao Jiang ◽  
Xiaoyun Huang ◽  
Hongpo Zhou ◽  
...  

AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.


2020 ◽  
Author(s):  
Wei Vivian Li ◽  
Yanzeng Li

AbstractA system-level understanding of the regulation and coordination mechanisms of gene expression is essential to understanding the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell-type-specific manner. Here we propose the scLink method, which uses statistical network modeling to understand the co-expression relationships among genes and to construct sparse gene co-expression networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis. The source code used in this article is available at https://github.com/Vivianstats/scLink.


2020 ◽  
Author(s):  
Ying Lei ◽  
Mengnan Cheng ◽  
Zihao Li ◽  
Zhenkun Zhuang ◽  
Liang Wu ◽  
...  

Non-human primates (NHP) provide a unique opportunity to study human neurological diseases, yet detailed characterization of the cell types and transcriptional regulatory features in the NHP brain is lacking. We applied a combinatorial indexing assay, sci-ATAC-seq, as well as single-nuclei RNA-seq, to profile chromatin accessibility in 43,793 single cells and transcriptomics in 11,477 cells, respectively, from prefrontal cortex, primary motor cortex and the primary visual cortex of adult cynomolgus monkey Macaca fascularis. Integrative analysis of these two datasets, resolved regulatory elements and transcription factors that specify cell type distinctions, and discovered area-specific diversity in chromatin accessibility and gene expression within excitatory neurons. We also constructed the dynamic landscape of chromatin accessibility and gene expression of oligodendrocyte maturation to characterize adult remyelination. Furthermore, we identified cell type-specific enrichment of differentially spliced gene isoforms and disease-associated single nucleotide polymorphisms. Our datasets permit integrative exploration of complex regulatory dynamics in macaque brain tissue at single-cell resolution.


Author(s):  
Zizhen Yao ◽  
Hanqing Liu ◽  
Fangming Xie ◽  
Stephan Fischer ◽  
A. Sina Booeshaghi ◽  
...  

AbstractSingle cell transcriptomics has transformed the characterization of brain cell identity by providing quantitative molecular signatures for large, unbiased samples of brain cell populations. With the proliferation of taxonomies based on individual datasets, a major challenge is to integrate and validate results toward defining biologically meaningful cell types. We used a battery of single-cell transcriptome and epigenome measurements generated by the BRAIN Initiative Cell Census Network (BICCN) to comprehensively assess the molecular signatures of cell types in the mouse primary motor cortex (MOp). We further developed computational and statistical methods to integrate these multimodal data and quantitatively validate the reproducibility of the cell types. The reference atlas, based on more than 600,000 high quality single-cell or -nucleus samples assayed by six molecular modalities, is a comprehensive molecular account of the diverse neuronal and non-neuronal cell types in MOp. Collectively, our study indicates that the mouse primary motor cortex contains over 55 neuronal cell types that are highly replicable across analysis methods, sequencing technologies, and modalities. We find many concordant multimodal markers for each cell type, as well as thousands of genes and gene regulatory elements with discrepant transcriptomic and epigenomic signatures. These data highlight the complex molecular regulation of brain cell types and will directly enable design of reagents to target specific MOp cell types for functional analysis.


2021 ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractSingle-cell measurements of different cellular features or modalities from cells from the same system allow for a comprehensive understanding of a biological process. While the most common single-cell sequencing technologies require separate input cells for different modalities, there are a growing number of platforms that allow for measuring several modalities on a single cell. We present a novel method, Cobolt, for analyzing such multi-modality single-cell sequencing datasets. Cobolt jointly models the multiple modalities via a novel application of Multimodal Variational Autoencoder (MVAE) to a hierarchical generative model. We first demonstrate its performance on data from the multi-modality platform SNARE-seq, consisting of measurements of gene expression and chromatin accessibility on the same cells. We then illustrate the ability of Cobolt to integrate multi-modality platforms with single-modality platforms by jointly analyzing a SNARE-seq dataset, a single-cell gene expression dataset, and a single-cell chromatin accessibility dataset. We compared Cobolt with current options for analyzing such datasets and show that Cobolt provides robust and flexible results for integration of single-cell data on multiple modalities.


2018 ◽  
Author(s):  
Mandeep Singh ◽  
Ghamdan Al-Eryani ◽  
Shaun Carswell ◽  
James M. Ferguson ◽  
James Blackburn ◽  
...  

AbstractHigh-throughput single-cell RNA-Sequencing is a powerful technique for gene expression profiling of complex and heterogeneous cellular populations such as the immune system. However, these methods only provide short-read sequence from one end of a cDNA template, making them poorly suited to the investigation of gene-regulatory events such as mRNA splicing, adaptive immune responses or somatic genome evolution. To address this challenge, we have developed a method that combines targeted long-read sequencing with short-read based transcriptome profiling of barcoded single cell libraries generated by droplet-based partitioning. We use Repertoire And Gene Expression sequencing (RAGE-seq) to accurately characterize full-length T cell (TCR) and B cell (BCR) receptor sequences and transcriptional profiles of more than 7,138 lymphocytes sampled from the primary tumour and draining lymph node of a breast cancer patient. With this method we show that somatic mutation, alternate splicing and clonal evolution of T and B lymphocytes can be tracked across these tissue compartments. Our results demonstrate that RAGE-Seq is an accessible and cost-effective method for high-throughput deep single cell profiling, applicable to a wide range of biological challenges.


2020 ◽  
Author(s):  
A. Sina Booeshaghi ◽  
Zizhen Yao ◽  
Cindy van Velthoven ◽  
Kimberly Smith ◽  
Bosiljka Tasic ◽  
...  

Full-length SMART-Seq single-cell RNA-seq can be used to measure gene expression at isoform resolution, making possible the identification of isoform markers for cell types and for an isoform atlas. In a comprehensive analysis of 6,160 mouse primary motor cortex cells assayed with SMART-Seq, we find numerous examples of isoform specificity in cell types, including isoform shifts between cell types that are masked in gene-level analysis. These findings can be used to refine spatial gene expression information to isoform resolution. Our results highlight the utility of full-length single-cell RNA-seq when used in conjunction with other single-cell RNA-seq technologies.


Author(s):  
Meng Zhang ◽  
Stephen W. Eichhorn ◽  
Brian Zingg ◽  
Zizhen Yao ◽  
Hongkui Zeng ◽  
...  

AbstractA mammalian brain is comprised of numerous cell types organized in an intricate manner to form functional neural circuits. Single-cell RNA sequencing provides a powerful approach to identify cell types based on their gene expression profiles and has revealed many distinct cell populations in the brain1-3. Single-cell epigenomic profiling4,5 further provides information on gene-regulatory signatures of different cell types. Understanding how different cell types contribute to brain function, however, requires knowledge of their spatial organization and connectivity, which is not preserved in sequencing-based methods that involve cell dissociation3,6. Here, we used an in situ single-cell transcriptome-imaging method, multiplexed error-robust fluorescence in situ hybridization (MERFISH)7, to generate a molecularly defined and spatially resolved cell atlas of the mouse primary motor cortex (MOp). We profiled ∼300,000 cells in the MOp, identified 95 neuronal and non-neuronal cell clusters, and revealed a complex spatial map in which not only excitatory neuronal clusters but also most inhibitory neuronal clusters adopted layered organizations. Notably, intratelencephalic (IT) cells, the largest branch of neurons in the MOp, formed a continuous spectrum of cells with gradual changes in both gene expression profiles and cortical depth positions in a highly correlated manner. Furthermore, we integrated MERFISH with retrograde tracing to probe the projection targets for different MOp neuronal cell types and found that projections of MOp neurons to other cortical regions formed a many-to-many network with each target region receiving input preferentially from a different composition of IT clusters. Overall, our results provide a high-resolution spatial and projection map of molecularly defined cell types in the MOp. We anticipate that the imaging platform described here can be broadly applied to create high-resolution cell atlases of a wide range of systems.


Nature ◽  
2021 ◽  
Vol 598 (7879) ◽  
pp. 137-143 ◽  
Author(s):  
Meng Zhang ◽  
Stephen W. Eichhorn ◽  
Brian Zingg ◽  
Zizhen Yao ◽  
Kaelan Cotter ◽  
...  

AbstractA mammalian brain is composed of numerous cell types organized in an intricate manner to form functional neural circuits. Single-cell RNA sequencing allows systematic identification of cell types based on their gene expression profiles and has revealed many distinct cell populations in the brain1,2. Single-cell epigenomic profiling3,4 further provides information on gene-regulatory signatures of different cell types. Understanding how different cell types contribute to brain function, however, requires knowledge of their spatial organization and connectivity, which is not preserved in sequencing-based methods that involve cell dissociation. Here we used a single-cell transcriptome-imaging method, multiplexed error-robust fluorescence in situ hybridization (MERFISH)5, to generate a molecularly defined and spatially resolved cell atlas of the mouse primary motor cortex. We profiled approximately 300,000 cells in the mouse primary motor cortex and its adjacent areas, identified 95 neuronal and non-neuronal cell clusters, and revealed a complex spatial map in which not only excitatory but also most inhibitory neuronal clusters adopted laminar organizations. Intratelencephalic neurons formed a largely continuous gradient along the cortical depth axis, in which the gene expression of individual cells correlated with their cortical depths. Furthermore, we integrated MERFISH with retrograde labelling to probe projection targets of neurons of the mouse primary motor cortex and found that their cortical projections formed a complex network in which individual neuronal clusters project to multiple target regions and individual target regions receive inputs from multiple neuronal clusters.


2021 ◽  
Author(s):  
Dongze He ◽  
Mohsen Zakeri ◽  
Hirak Sarkar ◽  
Charlotte Soneson ◽  
Avi Srivastava ◽  
...  

The rapid growth of high-throughput single-cell and single-nucleus RNA sequencing technologies has produced a wealth of data over the past few years. The available technologies continue to evolve and experiments continue to increase in both number and scale. The size, volume, and distinctive characteristics of these data necessitate the development of new software and associated computational methods to accurately and efficiently quantify single-cell and single-nucleus RNA-seq data into count matrices that constitute the input to downstream analyses. We introduce the alevin-fry framework for quantifying single-cell and single-nucleus RNA-seq data. Despite being faster and more memory frugal than other accurate and scalable quantification approaches, alevin-fry does not suffer from the false positive expression or memory scalability issues that are exhibited by other lightweight tools. We demonstrate how alevin-fry can be effectively used to quantify single-cell and single nucleus RNA-seq data, and also how the spliced and unspliced molecule quantification required as input for RNA velocity analyses can be seamlessly extracted from the same preprocessed data used to generate regular gene expression count matrices.


Sign in / Sign up

Export Citation Format

Share Document