Nanopore sequencing of single-cell transcriptomes with scCOLOR-seq

AbstractHere we describe single-cell corrected long-read sequencing (scCOLOR-seq), which enables error correction of barcode and unique molecular identifier oligonucleotide sequences and permits standalone cDNA nanopore sequencing of single cells. Barcodes and unique molecular identifiers are synthesized using dimeric nucleotide building blocks that allow error detection. We illustrate the use of the method for evaluating barcode assignment accuracy, differential isoform usage in myeloma cell lines, and fusion transcript detection in a sarcoma cell line.

Download Full-text

Single-cell RNA cap and tail sequencing (scRCAT-seq) reveals subtype-specific isoforms differing in transcript demarcation

Nature Communications ◽

10.1038/s41467-020-18976-7 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Youjin Hu ◽

Jiawei Zhong ◽

Yuhua Xiao ◽

Zheng Xing ◽

Katherine Sheu ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Single Gene ◽

Cell Types ◽

Machine Learning Algorithms ◽

Translation Efficiency ◽

Transcription Start Sites ◽

Long Read ◽

Mrna Gene ◽

Gene Isoforms

Abstract The differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Gene isoforms allow a single gene diverse functions across different cell types, and isoform dynamics allow different functions over time. However, methods to efficiently identify and quantify RNA isoforms genome-wide in single cells are still lacking. Here, we introduce single cell RNA Cap And Tail sequencing (scRCAT-seq), a method to demarcate the boundaries of isoforms based on short-read sequencing, with higher efficiency and lower cost than existing long-read sequencing methods. In conjunction with machine learning algorithms, scRCAT-seq demarcates RNA transcripts with unprecedented accuracy. We identified hundreds of previously uncharacterized transcripts and thousands of alternative transcripts for known genes, revealed cell-type specific isoforms for various cell types across different species, and generated a cell atlas of isoform dynamics during the development of retinal cones.

Download Full-text

UMI-linked nanopore consensus sequencing (UMIC-seq) of highly similar gene variants

10.21203/rs.3.pex-1177/v1 ◽

2020 ◽

Author(s):

Paul Jannis Zurek ◽

Philipp Knyphausen ◽

Katharina Neufeld ◽

Ahir Pushpanath ◽

Florian Hollfelder

Keyword(s):

Protein Evolution ◽

Cost Effective ◽

Gene Variants ◽

Nanopore Sequencing ◽

Consensus Sequences ◽

Long Read ◽

Unique Molecular Identifier ◽

Similar Gene ◽

Molecular Barcodes

Abstract Here we present a straightforward unique molecular identifier (UMI)-linked nanopore consensus sequencing workflow (UMIC-seq), resulting in cost-effective and accurate long-read sequencing of amplicons. Short random molecular barcodes (i.e. unique molecular identifiers, UMIs) are attached to a pool of gene variants prior to nanopore sequencing to enable reliable clustering and the generation of accurate consensus sequences, even when starting from highly similar gene variants (e.g. a library of point mutants in directed protein evolution) that could not be reliably distinguished in the ordinary nanopore sequencing output.

Download Full-text

Fusion transcript detection using spatial transcriptomics

10.21203/rs.2.19314/v2 ◽

2020 ◽

Author(s):

Stefanie Friedrich ◽

Erik LL Sonnhammer

Keyword(s):

Single Cell ◽

Spatial Information ◽

Cancer Cell Line ◽

Fusion Transcript ◽

Single Cell Level ◽

Cell Level ◽

Cancer Data ◽

Fusion Transcripts ◽

Tissue Sections ◽

Transcript Detection

Abstract Background Fusion transcripts are involved in tumourigenesis and play a crucial role in tumour heterogeneity, tumour evolution and cancer treatment resistance. However, fusion transcripts have not been studied at high spatial resolution in tissue sections due to the lack of full-length transcripts with spatial information. New high-throughput technologies like spatial transcriptomics measure the transcriptome of tissue sections on almost single-cell level. While this technique does not allow for direct detection of fusion transcripts, we show that they can be inferred using the relative poly(A) tail abundance of the involved parental genes. Method We present a new method STfusion, which uses spatial transcriptomics to infer the presence and absence of poly(A) tails. A fusion transcript lacks a poly(A) tail for the 5´ gene and has an elevated number of poly(A) tails for the 3´ gene. Its expression level is defined by the upstream promoter of the 5´ gene. STfusion measures the difference between the observed and expected number of poly(A) tails with a novel C-score. Results We verified the STfusion ability to predict fusion transcripts on HeLa cells with known fusions. STfusion and C-score applied to clinical prostate cancer data revealed the spatial distribution of the cis-SAGe SLC45A3-ELK4 in 12 tissue sections with almost single-cell resolution. The cis-SAGe occurred in disease areas, e.g. inflamed, prostatic intraepithelial neoplastic, or cancerous areas, and occasionally in normal glands. Conclusions STfusion detects fusion transcripts in cancer cell line and clinical tissue data, and distinguishes chimeric transcripts from chimeras caused by trans-splicing events. With STfusion and the use of C-scores, fusion transcripts can be spatially localised in clinical tissue sections on almost single cell level. Keywords Fusion transcript detection, Spatial Transcriptomics, gene fusion, cis-SAGE, oncogene

Download Full-text

Comprehensive characterization of single cell full-length isoforms in human and mouse with long-read sequencing

10.1101/2020.08.10.243543 ◽

2020 ◽

Author(s):

Luyi Tian ◽

Jafar S. Jabbari ◽

Rachel Thijssen ◽

Quentin Gouil ◽

Shanika L. Amarasinghe ◽

...

Keyword(s):

Data Integration ◽

Single Cell ◽

Ribosome Biogenesis ◽

Single Cells ◽

Transcript Level ◽

Full Length ◽

Alternative Transcript ◽

Long Read ◽

Comprehensive Characterization

AbstractAlternative splicing shapes the phenotype of cells in development and disease. Long-read RNA-sequencing recovers full-length transcripts but has limited throughput at the single-cell level. Here we developed single-cell full-length transcript sequencing by sampling (FLT-seq), together with the computational pipeline FLAMES to overcome these issues and perform isoform discovery and quantification, splicing analysis and mutation detection in single cells. With FLT-seq and FLAMES, we performed the first comprehensive characterization of the full-length isoform landscape in single cells of different types and species and identified thousands of unannotated isoforms. We found conserved functional modules that were enriched for alternative transcript usage in different cell populations, including ribosome biogenesis and mRNA splicing. Analysis at the transcript-level allowed data integration with scATAC-seq on individual promoters, improved correlation with protein expression data and linked mutations known to confer drug resistance to transcriptome heterogeneity. Our methods reveal previously unseen isoform complexity and provide a better framework for multi-omics data integration.

Download Full-text

ProtAnno, an Automated Cell Type Annotation Tool for Single Cell Proteomics Data that Integrates Information from Multiple Reference Sources

10.1101/2021.09.13.460162 ◽

2021 ◽

Author(s):

Wenxuan Deng ◽

Biqing Zhu ◽

Seyoung Park ◽

Tomokazu S. Sumida ◽

Avraham Unterman ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Earth Metal ◽

Cell Types ◽

Specific Cell ◽

Cell Type ◽

Proteomics Data ◽

Data Annotation ◽

Different Cell Types ◽

Unique Molecular Identifier

Compared with sequencing-based global genomic profiling, cytometry labels targeted surface markers on millions of cells in parallel either by conjugated rare earth metal particles or Unique Molecular Identifier (UMI) barcodes. Correct annotation of these cells to specific cell types is a key step in the analysis of these data. However, there is no computational tool that automatically annotates single cell proteomics data for cell type inference. In this manuscript, we propose an automated single cell proteomics data annotation approach called ProtAnno to facilitate cell type assignments without laborious manual gating. ProtAnno is designed to incorporate information from annotated single cell RNA-seq (scRNA-seq), CITE-seq, and prior data knowledge (which can be imprecise) on biomarkers for different cell types. We have performed extensive simulations to demonstrate the accuracy and robustness of ProtAnno. For several single cell proteomics datasets that have been manually labeled, ProtAnno was able to correctly label most single cells. In summary, ProtAnno offers an accurate and robust tool to automate cell type annotations for large single cell proteomics datasets, and the analysis of such annotated cell types can offer valuable biological insights.

Download Full-text

Mapping and modeling the genomic basis of differential RNA isoform expression at single-cell resolution with LR-Split-seq

Genome Biology ◽

10.1186/s13059-021-02505-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Elisabeth Rebboah ◽

Fairlie Reese ◽

Katherine Williams ◽

Gabriela Balderrama-Gutierrez ◽

Cassandra McGill ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Cell Types ◽

Transcription Start Sites ◽

Design Flexibility ◽

Long Reads ◽

Internal Exon ◽

Long Read ◽

Isoform Expression ◽

Full Length Transcript

AbstractThe rise in throughput and quality of long-read sequencing should allow unambiguous identification of full-length transcript isoforms. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here we develop and characterize long-read Split-seq (LR-Split-seq), which uses combinatorial barcoding to sequence single cells with long reads. Applied to the C2C12 myogenic system, LR-split-seq associates isoforms to cell types with relative economy and design flexibility. We find widespread evidence of changing isoform expression during differentiation including alternative transcription start sites (TSS) and/or alternative internal exon usage. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells.

Download Full-text

Highly accurate barcode and UMI error correction using dual nucleotide dimer blocks allows direct single-cell nanopore transcriptome sequencing

10.1101/2021.01.18.427145 ◽

2021 ◽

Author(s):

Martin Philpott ◽

Jonathan Watson ◽

Anjan Thakurta ◽

Tom Brown ◽

...

Keyword(s):

Single Cell ◽

Nanopore Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Single Cell Sequencing ◽

Base Calling ◽

Novel Approach ◽

Long Read ◽

First Time ◽

Insight Into

AbstractDroplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct either short-read or long-read sequencing, thereby allowing users to recover more reads per cell and permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and evaluate differential isoform usage and fusion transcripts using myeloma and sarcoma cell line models.

Download Full-text

Single-cell isoform RNA sequencing (ScISOr-Seq) across thousands of cells reveals isoforms of cerebellar cell types

10.1101/364950 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ishaan Gupta ◽

Paul G Collier ◽

Bettina Haase ◽

Ahmed Mahfouz ◽

Anoushka Joglekar ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Cell Types ◽

Full Length ◽

Cell Of Origin ◽

Cell Type ◽

Long Reads ◽

Long Read ◽

The Individual ◽

Bulk Tissue

AbstractFull-length isoform sequencing has advanced our knowledge of isoform biology1–11. However, apart from applying full-length isoform sequencing to very few single cells12,13, isoform sequencing has been limited to bulk tissue, cell lines, or sorted cells. Single splicing events have been described for <=200 single cells with great statistical success14,15, but these methods do not describe full-length mRNAs. Single cell short-read 3’ sequencing has allowed identification of many cell sub-types16–23, but full-length isoforms for these cell types have not been profiled. Using our new method of single-cell-isoform-RNA-sequencing (ScISOr-Seq) we determine isoform-expression in thousands of individual cells from a heterogeneous bulk tissue (cerebellum), without specific antibody-fluorescence activated cell sorting. We elucidate isoform usage in high-level cell types such as neurons, astrocytes and microglia and finer sub-types, such as Purkinje cells and Granule cells, including the combination patterns of distant splice sites6–9,24,25, which for individual molecules requires long reads. We produce an enhanced genome annotation revealing cell-type specific expression of known and 16,872 novel (with respect to mouse Gencode version 10) isoforms (see isoformatlas.com).ScISOr-Seq describes isoforms from >1,000 single cells from bulk tissue without cell sorting by leveraging two technologies in three steps: In step one, we employ microfluidics to produce amplified full-length cDNAs barcoded for their cell of origin. This cDNA is split into two pools: one pool for 3’ sequencing to measure gene expression (step 2) and another pool for long-read sequencing and isoform expression (step 3). In step two, short-read 3’-sequencing provides molecular counts for each gene and cell, which allows clustering cells and assigning a cell type using cell-type specific markers. In step three, an aliquot of the same cDNAs (each barcoded for the individual cell of origin) is sequenced using Pacific Biosciences (“PacBio”)1,2,4,5,26 or Oxford Nanopore3. Since these long reads carry the single-cell barcodes identified in step two, one can determine the individual cell from which each long read originates. Since most single cells are assigned to a named cluster, we can also assign the cell’s cluster name (e.g. “Purkinje cell” or “astrocyte”) to the long read in question (Fig 1A) – without losing the cell of origin of each long read.

Download Full-text

scruff: An R/Bioconductor package for preprocessing single-cell RNA-sequencing data

10.1101/522037 ◽

2019 ◽

Author(s):

Zhe Wang ◽

Junming Hu ◽

Evan W. Johnson ◽

Joshua D. Campbell

Keyword(s):

Data Quality ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Quality Metrics ◽

Bioconductor Package ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Data Quality Metrics ◽

Unique Molecular Identifier

AbstractBackgroundSingle-cell RNA sequencing (scRNA-seq) enables the high-throughput quantification of transcriptional profiles in single cells. In contrast to bulk RNA-seq, additional preprocessing steps such as cell barcode identification or unique molecular identifier (UMI) deconvolution are necessary for preprocessing of data from single cell protocols. R packages that can easily preprocess data and rapidly visualize quality metrics and read alignments for individual cells across multiple samples or runs are still lacking.ResultsHere we present scruff, an R/Bioconductor package that preprocesses data generated from the CEL-Seq or CEL-Seq2 protocols and reports comprehensive data quality metrics and visualizations. scruff demultiplexes, aligns, and counts the reads mapped to genome features with deduplication of unique molecular identifier (UMI) tags. scruff also provides novel and extensive functions to visualize both pre- and post-alignment data quality metrics for cells from multiple experiments. Detailed read alignments with corresponding UMI information can be visualized at specific genome coordinates to display differences in isoform usage. The package also supports the visualization of quality metrics for sequence alignment files for multiple experiments generated by Cell Ranger from 10X Genomics. scruff is available as a free and open-source R/Bioconductor package.Conclusionsscruff streamlines the preprocessing of scRNA-seq data in a few simple R commands. It performs data demultiplexing, alignment, counting, quality report and visualization systematically and comprehensively, ensuring reproducible and reliable analysis of scRNA-seq data.

Download Full-text

scCAT-seq:single-cell identification and quantification of mRNA isoforms by cost-effective short-read sequencing of cap and tail

10.1101/2019.12.11.873505 ◽

2019 ◽

Author(s):

Youjin Hu ◽

Jiawei Zhong ◽

Yuhua Xiao ◽

Zheng Xing ◽

Katherine Sheu ◽

...

Keyword(s):

Single Cell ◽

Learning Algorithm ◽

Single Cells ◽

Full Length ◽

Translation Efficiency ◽

Mrna Isoforms ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Identification And Quantification

AbstractThe differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Isoforms also allow a single gene different functions across various tissues and cells However, methods for efficient genome-wide identification and quantification of RNA isoforms in single cells are still lacking. Here, we introduce single cell Cap And Tail sequencing (scCAT-seq). In conjunction with a novel machine learning algorithm developed for TSS/TES characterization, scCAT-seq can demarcate transcript boundaries of RNA transcripts, providing an unprecedented way to identify and quantify single-cell full-length RNA isoforms based on short-read sequencing. Compared with existing long-read sequencing methods, scCAT-seq has higher efficiency with lower cost. Using scCAT-seq, we identified hundreds of previously uncharacterized full-length transcripts and thousands of alternative transcripts for known genes, quantitatively revealed cell-type specific isoforms with alternative TSSs/TESs in dorsal root ganglion (DRG) neurons, mature oocytes and ageing oocytes, and generated the first atlas of the non-human primate cornea. The approach described here can be widely adapted to other short-read or long-read methods to improve accuracy and efficiency in assessing RNA isoform dynamics among single cells.

Download Full-text