Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA

High-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-distance information is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. While long-read sequencing technology can provide this long-distance information, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here, we introduce and validate a long-read ONT-based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single-cell samples.

Download Full-text

R2C2: Improving nanopore read accuracy enables the sequencing of highly-multiplexed full-length single-cell cDNA

10.1101/338020 ◽

2018 ◽

Cited By ~ 1

Author(s):

Roger Volden ◽

Theron Palmer ◽

Ashley Byrne ◽

Charles Cole ◽

Robert J Schmitz ◽

...

Keyword(s):

Quantitative Analysis ◽

Single Cell ◽

Cancer Biology ◽

Full Length ◽

Short Read ◽

Transcript Isoforms ◽

Short Read Sequencing ◽

Sequencing Method ◽

Long Read ◽

Rna Transcript

AbstractHigh-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-read sequencing is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. However, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here we introduce and validate a new long-read ONT based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single cell samples.Significance StatementSubtle changes in RNA transcript isoform expression can have dramatic effects on cellular behaviors in both health and disease. As such, comprehensive and quantitative analysis of isoform-level transcriptomes would open an entirely new window into cellular diversity in fields ranging from developmental to cancer biology. The R2C2 method we are presenting here is the first method with sufficient throughput and accuracy to make the comprehensive and quantitative analysis of RNA transcript isoforms in bulk and single cell samples economically feasible.

Download Full-text

scCAT-seq:single-cell identification and quantification of mRNA isoforms by cost-effective short-read sequencing of cap and tail

10.1101/2019.12.11.873505 ◽

2019 ◽

Author(s):

Youjin Hu ◽

Jiawei Zhong ◽

Yuhua Xiao ◽

Zheng Xing ◽

Katherine Sheu ◽

...

Keyword(s):

Single Cell ◽

Learning Algorithm ◽

Single Cells ◽

Full Length ◽

Translation Efficiency ◽

Mrna Isoforms ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Identification And Quantification

AbstractThe differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Isoforms also allow a single gene different functions across various tissues and cells However, methods for efficient genome-wide identification and quantification of RNA isoforms in single cells are still lacking. Here, we introduce single cell Cap And Tail sequencing (scCAT-seq). In conjunction with a novel machine learning algorithm developed for TSS/TES characterization, scCAT-seq can demarcate transcript boundaries of RNA transcripts, providing an unprecedented way to identify and quantify single-cell full-length RNA isoforms based on short-read sequencing. Compared with existing long-read sequencing methods, scCAT-seq has higher efficiency with lower cost. Using scCAT-seq, we identified hundreds of previously uncharacterized full-length transcripts and thousands of alternative transcripts for known genes, quantitatively revealed cell-type specific isoforms with alternative TSSs/TESs in dorsal root ganglion (DRG) neurons, mature oocytes and ageing oocytes, and generated the first atlas of the non-human primate cornea. The approach described here can be widely adapted to other short-read or long-read methods to improve accuracy and efficiency in assessing RNA isoform dynamics among single cells.

Download Full-text

Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing

The ISME Journal ◽

10.1038/s41396-021-01052-3 ◽

2021 ◽

Author(s):

Valentin Waschulin ◽

Chiara Borsetto ◽

Robert James ◽

Kevin K. Newsham ◽

Stefano Donadio ◽

...

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Full Length ◽

Metagenomic Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Rich Diversity ◽

Long Read ◽

The Rich

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.

Download Full-text

Qualitative De Novo Analysis of Full Length cDNA and Quantitative Analysis of Gene Expression for Common Marmoset (Callithrix jacchus) Transcriptomes Using Parallel Long-Read Technology and Short-Read Sequencing

PLoS ONE ◽

10.1371/journal.pone.0100936 ◽

2014 ◽

Vol 9 (6) ◽

pp. e100936 ◽

Cited By ~ 24

Author(s):

Makiko Shimizu ◽

Shunsuke Iwano ◽

Yasuhiro Uno ◽

Shotaro Uehara ◽

Takashi Inoue ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Analysis ◽

De Novo ◽

Common Marmoset ◽

Callithrix Jacchus ◽

Full Length ◽

Short Read ◽

Full Length Cdna ◽

Short Read Sequencing ◽

Long Read

Download Full-text

Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION™ nanopore sequencing confers species-level resolution

BMC Microbiology ◽

10.1186/s12866-021-02094-5 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yoshiyuki Matsuo ◽

Shinnosuke Komiya ◽

Yoshiaki Yasumizu ◽

Yuki Yasuoka ◽

Katsura Mizushima ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

Rrna Gene ◽

Short Read ◽

Short Read Sequencing ◽

Long Read

Abstract Background Species-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples. Results We modified our existing protocol for full-length 16S rRNA gene amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S rRNA gene amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition. Conclusions Our present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene.

Download Full-text

Highly accurate barcode and UMI error correction using dual nucleotide dimer blocks allows direct single-cell nanopore transcriptome sequencing

10.1101/2021.01.18.427145 ◽

2021 ◽

Author(s):

Martin Philpott ◽

Jonathan Watson ◽

Anjan Thakurta ◽

Tom Brown ◽

...

Keyword(s):

Single Cell ◽

Nanopore Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Single Cell Sequencing ◽

Base Calling ◽

Novel Approach ◽

Long Read ◽

First Time ◽

Insight Into

AbstractDroplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct either short-read or long-read sequencing, thereby allowing users to recover more reads per cell and permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and evaluate differential isoform usage and fusion transcripts using myeloma and sarcoma cell line models.

Download Full-text

Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION™ nanopore sequencing confers species-level resolution

10.1101/2020.05.06.078147 ◽

2020 ◽

Cited By ~ 3

Author(s):

Yoshiyuki Matsuo ◽

Shinnosuke Komiya ◽

Yoshiaki Yasumizu ◽

Yuki Yasuoka ◽

Katsura Mizushima ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

Rrna Gene ◽

Short Read ◽

Short Read Sequencing ◽

16S Amplicon Sequencing ◽

Long Read

AbstractBackgroundSpecies-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples.ResultsWe modified our existing protocol for full-length 16S amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition.ConclusionsOur present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene, which provided the requisite species-level resolution and accuracy in clinical settings.

Download Full-text

Phased Haplotype Resolution of the SLC6A4 Promoter Using Long-Read Single Molecule Real-Time (SMRT) Sequencing

Genes ◽

10.3390/genes11111333 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1333

Author(s):

Mariana R. Botton ◽

Yao Yang ◽

Erick R. Scott ◽

Robert J. Desnick ◽

Stuart A. Scott

Keyword(s):

Real Time ◽

Single Molecule ◽

Variable Number ◽

Sequencing Data ◽

Smrt Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Method ◽

Long Read ◽

S Allele

The SLC6A4 gene has been implicated in psychiatric disorder susceptibility and antidepressant response variability. The SLC6A4 promoter is defined by a variable number of homologous 20–24 bp repeats (5-HTTLPR), and long (L) and short (S) alleles are associated with higher and lower expression, respectively. However, this insertion/deletion variant is most informative when considered as a haplotype with the rs25531 and rs25532 variants. Therefore, we developed a long-read single molecule real-time (SMRT) sequencing method to interrogate the SLC6A4 promoter region. A total of 120 samples were subjected to SLC6A4 long-read SMRT sequencing, primarily selected based on available short-read sequencing data. Short-read genome sequencing from the 1000 Genomes (1KG) Project (~5X) and the Genetic Testing Reference Material Coordination Program (~45X), as well as high-depth short-read capture-based sequencing (~330X), could not identify the 5-HTTLPR short (S) allele, nor could short-read sequencing phase any identified variants. In contrast, long-read SMRT sequencing unambiguously identified the 5-HTTLPR short (S) allele (frequency of 0.467) and phased SLC6A4 promoter haplotypes. Additionally, discordant rs25531 genotypes were reviewed and determined to be short-read errors. Taken together, long-read SMRT sequencing is an innovative and robust method for phased resolution of the SLC6A4 promoter, which could enable more accurate pharmacogenetic testing for both research and clinical applications.

Download Full-text

Realizing the potential of full-length transcriptome sequencing

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2019.0097 ◽

2019 ◽

Vol 374 (1786) ◽

pp. 20190097 ◽

Cited By ~ 13

Author(s):

Ashley Byrne ◽

Charles Cole ◽

Roger Volden ◽

Christopher Vollmers

Keyword(s):

Single Cell ◽

Transcriptome Analysis ◽

Transcriptome Sequencing ◽

Model Organisms ◽

Sequencing Technology ◽

Short Read ◽

Short Read Sequencing ◽

Unicellular Eukaryotes ◽

Long Read ◽

Future Work

Long-read sequencing holds great potential for transcriptome analysis because it offers researchers an affordable method to annotate the transcriptomes of non-model organisms. This, in turn, will greatly benefit future work on less-researched organisms like unicellular eukaryotes that cannot rely on large consortia to generate these transcriptome annotations. However, to realize this potential, several remaining molecular and computational challenges will have to be overcome. In this review, we have outlined the limitations of short-read sequencing technology and how long-read sequencing technology overcomes these limitations. We have also highlighted the unique challenges still present for long-read sequencing technology and provided some suggestions on how to overcome these challenges going forward. This article is part of a discussion meeting issue ‘Single cell ecology’.

Download Full-text

A Single-Molecule Long-Read Survey of Human Transcriptomes using LoopSeq Synthetic Long Read Sequencing

10.1101/532135 ◽

2019 ◽

Cited By ~ 5

Author(s):

Indira Wu ◽

Tuval Ben-Yehezkel

Keyword(s):

Single Molecule ◽

Transcriptome Sequencing ◽

Splice Variants ◽

Error Rates ◽

Full Length ◽

Tissue Samples ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Sequence Reconstruction

AbstractState-of-the-art short-read transcriptome sequencing methods employ unique molecular identifier (UMI) to accurately classify and count mRNA transcripts. A fundamental limitation of UMI-based short-read transcriptome sequencing is that each read typically covers a small fraction of the transcript sequence. Efforts to accurately characterize splicing isoforms, arguably the largest source of variation in Human gene expression, using short read sequencing have therefore largely relied on computational predictions of transcript isoforms based on indirect observations. Here we describe a transcript counting, synthetic long read method for sequencing whole transcriptomes using short read sequencing platforms and no additional hardware. The method enables full-length mRNA sequence reconstruction at single-nucleotide resolutions with high-throughput, low error rates and UMI based transcript counting using any Illumina sequencer. We describe results from whole transcriptome sequencing from total RNA extracted from 3 human tissue samples: brain, liver, and blood. Reconstructed transcript sequences are characterized and annotated using SQANTI, an analysis pipeline for assessing the sequence quality of long-read transcriptomes. Our results demonstrate that LoopSeq synthetic long-read sequencing can reconstruct contigs up to 3,900nt full-length transcripts using tissue extracted RNA, as well as identify novel splice variants of known junction donors and acceptors.

Download Full-text