scholarly journals scCAT-seq:single-cell identification and quantification of mRNA isoforms by cost-effective short-read sequencing of cap and tail

2019 ◽  
Author(s):  
Youjin Hu ◽  
Jiawei Zhong ◽  
Yuhua Xiao ◽  
Zheng Xing ◽  
Katherine Sheu ◽  
...  

AbstractThe differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Isoforms also allow a single gene different functions across various tissues and cells However, methods for efficient genome-wide identification and quantification of RNA isoforms in single cells are still lacking. Here, we introduce single cell Cap And Tail sequencing (scCAT-seq). In conjunction with a novel machine learning algorithm developed for TSS/TES characterization, scCAT-seq can demarcate transcript boundaries of RNA transcripts, providing an unprecedented way to identify and quantify single-cell full-length RNA isoforms based on short-read sequencing. Compared with existing long-read sequencing methods, scCAT-seq has higher efficiency with lower cost. Using scCAT-seq, we identified hundreds of previously uncharacterized full-length transcripts and thousands of alternative transcripts for known genes, quantitatively revealed cell-type specific isoforms with alternative TSSs/TESs in dorsal root ganglion (DRG) neurons, mature oocytes and ageing oocytes, and generated the first atlas of the non-human primate cornea. The approach described here can be widely adapted to other short-read or long-read methods to improve accuracy and efficiency in assessing RNA isoform dynamics among single cells.

2018 ◽  
Vol 115 (39) ◽  
pp. 9726-9731 ◽  
Author(s):  
Roger Volden ◽  
Theron Palmer ◽  
Ashley Byrne ◽  
Charles Cole ◽  
Robert J. Schmitz ◽  
...  

High-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-distance information is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. While long-read sequencing technology can provide this long-distance information, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here, we introduce and validate a long-read ONT-based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single-cell samples.


2018 ◽  
Author(s):  
Roger Volden ◽  
Theron Palmer ◽  
Ashley Byrne ◽  
Charles Cole ◽  
Robert J Schmitz ◽  
...  

AbstractHigh-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-read sequencing is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. However, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here we introduce and validate a new long-read ONT based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single cell samples.Significance StatementSubtle changes in RNA transcript isoform expression can have dramatic effects on cellular behaviors in both health and disease. As such, comprehensive and quantitative analysis of isoform-level transcriptomes would open an entirely new window into cellular diversity in fields ranging from developmental to cancer biology. The R2C2 method we are presenting here is the first method with sufficient throughput and accuracy to make the comprehensive and quantitative analysis of RNA transcript isoforms in bulk and single cell samples economically feasible.


2021 ◽  
Author(s):  
Valentin Waschulin ◽  
Chiara Borsetto ◽  
Robert James ◽  
Kevin K. Newsham ◽  
Stefano Donadio ◽  
...  

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Youjin Hu ◽  
Jiawei Zhong ◽  
Yuhua Xiao ◽  
Zheng Xing ◽  
Katherine Sheu ◽  
...  

Abstract The differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Gene isoforms allow a single gene diverse functions across different cell types, and isoform dynamics allow different functions over time. However, methods to efficiently identify and quantify RNA isoforms genome-wide in single cells are still lacking. Here, we introduce single cell RNA Cap And Tail sequencing (scRCAT-seq), a method to demarcate the boundaries of isoforms based on short-read sequencing, with higher efficiency and lower cost than existing long-read sequencing methods. In conjunction with machine learning algorithms, scRCAT-seq demarcates RNA transcripts with unprecedented accuracy. We identified hundreds of previously uncharacterized transcripts and thousands of alternative transcripts for known genes, revealed cell-type specific isoforms for various cell types across different species, and generated a cell atlas of isoform dynamics during the development of retinal cones.


2020 ◽  
Author(s):  
Luyi Tian ◽  
Jafar S. Jabbari ◽  
Rachel Thijssen ◽  
Quentin Gouil ◽  
Shanika L. Amarasinghe ◽  
...  

AbstractAlternative splicing shapes the phenotype of cells in development and disease. Long-read RNA-sequencing recovers full-length transcripts but has limited throughput at the single-cell level. Here we developed single-cell full-length transcript sequencing by sampling (FLT-seq), together with the computational pipeline FLAMES to overcome these issues and perform isoform discovery and quantification, splicing analysis and mutation detection in single cells. With FLT-seq and FLAMES, we performed the first comprehensive characterization of the full-length isoform landscape in single cells of different types and species and identified thousands of unannotated isoforms. We found conserved functional modules that were enriched for alternative transcript usage in different cell populations, including ribosome biogenesis and mRNA splicing. Analysis at the transcript-level allowed data integration with scATAC-seq on individual promoters, improved correlation with protein expression data and linked mutations known to confer drug resistance to transcriptome heterogeneity. Our methods reveal previously unseen isoform complexity and provide a better framework for multi-omics data integration.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yoshiyuki Matsuo ◽  
Shinnosuke Komiya ◽  
Yoshiaki Yasumizu ◽  
Yuki Yasuoka ◽  
Katsura Mizushima ◽  
...  

Abstract Background Species-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples. Results We modified our existing protocol for full-length 16S rRNA gene amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S rRNA gene amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition. Conclusions Our present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene.


2021 ◽  
Author(s):  
Martin Philpott ◽  
Jonathan Watson ◽  
Anjan Thakurta ◽  
Tom Brown ◽  
Tom Brown ◽  
...  

AbstractDroplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct either short-read or long-read sequencing, thereby allowing users to recover more reads per cell and permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and evaluate differential isoform usage and fusion transcripts using myeloma and sarcoma cell line models.


2018 ◽  
Author(s):  
Ishaan Gupta ◽  
Paul G Collier ◽  
Bettina Haase ◽  
Ahmed Mahfouz ◽  
Anoushka Joglekar ◽  
...  

AbstractFull-length isoform sequencing has advanced our knowledge of isoform biology1–11. However, apart from applying full-length isoform sequencing to very few single cells12,13, isoform sequencing has been limited to bulk tissue, cell lines, or sorted cells. Single splicing events have been described for <=200 single cells with great statistical success14,15, but these methods do not describe full-length mRNAs. Single cell short-read 3’ sequencing has allowed identification of many cell sub-types16–23, but full-length isoforms for these cell types have not been profiled. Using our new method of single-cell-isoform-RNA-sequencing (ScISOr-Seq) we determine isoform-expression in thousands of individual cells from a heterogeneous bulk tissue (cerebellum), without specific antibody-fluorescence activated cell sorting. We elucidate isoform usage in high-level cell types such as neurons, astrocytes and microglia and finer sub-types, such as Purkinje cells and Granule cells, including the combination patterns of distant splice sites6–9,24,25, which for individual molecules requires long reads. We produce an enhanced genome annotation revealing cell-type specific expression of known and 16,872 novel (with respect to mouse Gencode version 10) isoforms (see isoformatlas.com).ScISOr-Seq describes isoforms from >1,000 single cells from bulk tissue without cell sorting by leveraging two technologies in three steps: In step one, we employ microfluidics to produce amplified full-length cDNAs barcoded for their cell of origin. This cDNA is split into two pools: one pool for 3’ sequencing to measure gene expression (step 2) and another pool for long-read sequencing and isoform expression (step 3). In step two, short-read 3’-sequencing provides molecular counts for each gene and cell, which allows clustering cells and assigning a cell type using cell-type specific markers. In step three, an aliquot of the same cDNAs (each barcoded for the individual cell of origin) is sequenced using Pacific Biosciences (“PacBio”)1,2,4,5,26 or Oxford Nanopore3. Since these long reads carry the single-cell barcodes identified in step two, one can determine the individual cell from which each long read originates. Since most single cells are assigned to a named cluster, we can also assign the cell’s cluster name (e.g. “Purkinje cell” or “astrocyte”) to the long read in question (Fig 1A) – without losing the cell of origin of each long read.


Author(s):  
Yoshiyuki Matsuo ◽  
Shinnosuke Komiya ◽  
Yoshiaki Yasumizu ◽  
Yuki Yasuoka ◽  
Katsura Mizushima ◽  
...  

AbstractBackgroundSpecies-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples.ResultsWe modified our existing protocol for full-length 16S amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition.ConclusionsOur present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene, which provided the requisite species-level resolution and accuracy in clinical settings.


Sign in / Sign up

Export Citation Format

Share Document