Large-Scale Multiplexing Permits Full-Length Transcriptome Annotation of 32 Bovine Tissues From a Single Nanopore Flow Cell

A comprehensive annotation of transcript isoforms in domesticated species is lacking. Especially considering that transcriptome complexity and splicing patterns are not well-conserved between species, this presents a substantial obstacle to genomic selection programs that seek to improve production, disease resistance, and reproduction. Recent advances in long-read sequencing technology have made it possible to directly extrapolate the structure of full-length transcripts without the need for transcript reconstruction. In this study, we demonstrate the power of long-read sequencing for transcriptome annotation by coupling Oxford Nanopore Technology (ONT) with large-scale multiplexing of 93 samples, comprising 32 tissues collected from adult male and female Hereford cattle. More than 30 million uniquely mapping full-length reads were obtained from a single ONT flow cell, and used to identify and characterize the expression dynamics of 99,044 transcript isoforms at 31,824 loci. Of these predicted transcripts, 21% exactly matched a reference transcript, and 61% were novel isoforms of reference genes, substantially increasing the ratio of transcript variants per gene, and suggesting that the complexity of the bovine transcriptome is comparable to that in humans. Over 7,000 transcript isoforms were extremely tissue-specific, and 61% of these were attributed to testis, which exhibited the most complex transcriptome of all interrogated tissues. Despite profiling over 30 tissues, transcription was only detected at about 60% of reference loci. Consequently, additional studies will be necessary to continue characterizing the bovine transcriptome in additional cell types, developmental stages, and physiological conditions. However, by here demonstrating the power of ONT sequencing coupled with large-scale multiplexing, the task of exhaustively annotating the bovine transcriptome – or any mammalian transcriptome – appears significantly more feasible.

Download Full-text

Single-Molecule Real-Time Sequencing of the Madhuca pasquieri (Dubard) Lam. Transcriptome Reveals the Diversity of Full-Length Transcripts

Forests ◽

10.3390/f11080866 ◽

2020 ◽

Vol 11 (8) ◽

pp. 866

Author(s):

Lei Kan ◽

Qicong Liao ◽

Zhiyao Su ◽

Yushan Tan ◽

Shuyu Wang ◽

...

Keyword(s):

Seed Germination ◽

Single Molecule ◽

Developmental Stages ◽

De Novo ◽

Full Length ◽

Wild Plant ◽

Transcript Isoforms ◽

Long Read ◽

Full Length Transcript ◽

Generation Sequencing

Madhuca pasquieri (Dubard) Lam. is a tree on the International Union for Conservation of Nature Red List and a national key protected wild plant (II) of China, known for its seed oil and timber. However, lacking of genomic and transcriptome data for this species hampers study of its reproduction, utilization, and conservation. Here, single-molecule long-read sequencing (PacBio) and next-generation sequencing (Illumina) were combined to obtain the transcriptome from five developmental stages of M. pasquieri. Overall, 25,339 transcript isoforms were detected by PacBio, including 24,492 coding sequences (CDSs), 9440 simple sequence repeats (SSRs), 149 long non-coding RNAs (lncRNAs), and 182 alternative splicing (AS) events, a majority was retained intron (RI). A further 1058 transcripts were identified as transcriptional factors (TFs) from 51 TF families. PacBio recovered more full-length transcript isoforms with a longer length, and a higher expression level, whereas larger number of transcripts (124,405) was captured in de novo from Illumina. Using Nr, Swissprot, KOG, and KEGG databases, 24,405 transcripts (96.31%) were annotated by PacBio. Functional annotation revealed a role for the auxin, abscisic acid, gibberellin, and cytokinine metabolic pathways in seed germination and post-germination. These findings support further studies on seed germination mechanism and genome of M. pasquieri, and better protection of this endangered species.

Download Full-text

FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants

Genome Biology ◽

10.1186/s13059-021-02288-0 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 2

Author(s):

Yanping Long ◽

Zhijian Liu ◽

Jinbu Jia ◽

Weipeng Mo ◽

Liang Fang ◽

...

Keyword(s):

Single Cell ◽

Cell Walls ◽

Large Scale ◽

Full Length ◽

Cell Level ◽

Root Cells ◽

Rna Profiling ◽

Different Types ◽

Long Read ◽

Single Nucleus

AbstractThe broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.

Download Full-text

Single-cell RNA counting at allele- and isoform-resolution using Smart-seq3

10.1101/817924 ◽

2019 ◽

Cited By ~ 6

Author(s):

Michael Hagemann-Jensen ◽

Christoph Ziegenhain ◽

Ping Chen ◽

Daniel Ramsköld ◽

Gert-Jan Hendriks ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Cell Types ◽

Mouse Strains ◽

Rna Molecules ◽

Counting Strategy ◽

Long Read ◽

Sequencing Strategy ◽

Transcriptome Coverage ◽

Scale Characterization

AbstractLarge-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states1. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells2,3. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5’ unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.

Download Full-text

Single-Molecule Long-Read Sequencing Reveals the Diversity of Full-Length Transcripts in Leaves of Gnetum (Gnetales)

International Journal of Molecular Sciences ◽

10.3390/ijms20246350 ◽

2019 ◽

Vol 20 (24) ◽

pp. 6350 ◽

Cited By ~ 2

Author(s):

Nan Deng ◽

Chen Hou ◽

Fengfeng Ma ◽

Caixia Liu ◽

Yuxin Tian

Keyword(s):

Single Molecule ◽

Developmental Stages ◽

Alternative Polyadenylation ◽

Full Length ◽

Stomatal Development ◽

Rna Seq ◽

Leaf Transcriptome ◽

Long Read ◽

Non Coding Rnas ◽

A Site

The limitations of RNA sequencing make it difficult to accurately predict alternative splicing (AS) and alternative polyadenylation (APA) events and long non-coding RNAs (lncRNAs), all of which reveal transcriptomic diversity and the complexity of gene regulation. Gnetum, a genus with ambiguous phylogenetic placement in seed plants, has a distinct stomatal structure and photosynthetic characteristics. In this study, a full-length transcriptome of Gnetum luofuense leaves at different developmental stages was sequenced with the latest PacBio Sequel platform. After correction by short reads generated by Illumina RNA-Seq, 80,496 full-length transcripts were obtained, of which 5269 reads were identified as isoforms of novel genes. Additionally, 1660 lncRNAs and 12,998 AS events were detected. In total, 5647 genes in the G. luofuense leaves had APA featured by at least one poly(A) site. Moreover, 67 and 30 genes from the bHLH gene family, which play an important role in stomatal development and photosynthesis, were identified from the G. luofuense genome and leaf transcripts, respectively. This leaf transcriptome supplements the reference genome of G. luofuense, and the AS events and lncRNAs detected provide valuable resources for future studies of investigating low photosynthetic capacity of Gnetum.

Download Full-text

Expression Profiling of Mammalian Male Meiosis and Gametogenesis Identifies Novel Candidate Genes for Roles in the Regulation of Fertility

Molecular Biology of the Cell ◽

10.1091/mbc.e03-10-0762 ◽

2004 ◽

Vol 15 (3) ◽

pp. 1031-1043 ◽

Cited By ~ 97

Author(s):

Ulrich Schlecht ◽

Philippe Demougin ◽

Reinhold Koch ◽

Leandro Hermida ◽

Christa Wiederkehr ◽

...

Keyword(s):

Germ Cell ◽

Germ Cells ◽

Expression Profiling ◽

Large Scale ◽

Developmental Stages ◽

Expression Patterns ◽

Control Cell ◽

Cell Types ◽

Male Meiosis ◽

Cell Expression

We report a comprehensive large-scale expression profiling analysis of mammalian male germ cells undergoing mitotic growth, meiosis, and gametogenesis by using high-density oligonucleotide microarrays and highly enriched cell populations. Among 11,955 rat loci investigated, 1268 were identified as differentially transcribed in germ cells at subsequent developmental stages compared with total testis, somatic Sertoli cells as well as brain and skeletal muscle controls. The loci were organized into four expression clusters that correspond to somatic, mitotic, meiotic, and postmeiotic cell types. This work provides information about expression patterns of ∼200 genes known to be important during male germ cell development. Approximately 40 of those are included in a group of 121 transcripts for which we report germ cell expression and lack of transcription in three somatic control cell types. Moreover, we demonstrate the testicular expression and transcriptional induction in mitotic, meiotic, and/or postmeiotic germ cells of 293 as yet uncharacterized transcripts, some of which are likely to encode factors involved in spermatogenesis and fertility. This group also contains potential germ cell-specific targets for innovative contraceptives. A graphical display of the data is conveniently accessible through the GermOnline database at http://www.germonline.org .

Download Full-text

Altered cell and RNA isoform diversity in aging Down syndrome brains

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2114326118 ◽

2021 ◽

Vol 118 (47) ◽

pp. e2114326118

Author(s):

Carter R. Palmer ◽

Christine S. Liu ◽

William J. Romanow ◽

Ming-Hsiang Lee ◽

Jerold Chun

Keyword(s):

Down Syndrome ◽

Large Scale ◽

Cell Types ◽

Chromosome 21 ◽

Specific Cell ◽

Sequencing Technologies ◽

Isoform Diversity ◽

Long Read ◽

Single Nucleus ◽

Altered Cell

Down syndrome (DS), trisomy of human chromosome 21 (HSA21), is characterized by lifelong cognitive impairments and the development of the neuropathological hallmarks of Alzheimer’s disease (AD). The cellular and molecular modifications responsible for these effects are not understood. Here we performed single-nucleus RNA sequencing (snRNA-seq) employing both short- (Illumina) and long-read (Pacific Biosciences) sequencing technologies on a total of 29 DS and non-DS control prefrontal cortex samples. In DS, the ratio of inhibitory-to-excitatory neurons was significantly increased, which was not observed in previous reports examining sporadic AD. DS microglial transcriptomes displayed AD-related aging and activation signatures in advance of AD neuropathology, with increased microglial expression of C1q complement genes (associated with dendritic pruning) and the HSA21 transcription factor gene RUNX1. Long-read sequencing detected vast RNA isoform diversity within and among specific cell types, including numerous sequences that differed between DS and control brains. Notably, over 8,000 genes produced RNAs containing intra-exonic junctions, including amyloid precursor protein (APP) that had previously been associated with somatic gene recombination. These and related results illuminate large-scale cellular and transcriptomic alterations as features of the aging DS brain.

Download Full-text

ISOdb: A Comprehensive Database of Full-Length Isoforms Generated by Iso-Seq

International Journal of Genomics ◽

10.1155/2018/9207637 ◽

2018 ◽

Vol 2018 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Shang-Qian Xie ◽

Yue Han ◽

Xiao-Zhou Chen ◽

Tai-Yu Cao ◽

Kai-Kai Ji ◽

...

Keyword(s):

Single Molecule ◽

Full Length ◽

Public Access ◽

Transcript Isoforms ◽

Sequencing Technologies ◽

Long Reads ◽

Depth Analysis ◽

Gene Level ◽

Long Read ◽

Full Length Transcript

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.

Download Full-text

Single-nucleus full-length RNA profiling in plants incorporates isoform information to facilitate cell type identification

10.1101/2020.11.25.397919 ◽

2020 ◽

Author(s):

Yanping Long ◽

Zhijian Liu ◽

Jinbu Jia ◽

Weipeng Mo ◽

Liang Fang ◽

...

Keyword(s):

Single Cell ◽

Single Molecule ◽

Large Scale ◽

Alternative Polyadenylation ◽

Full Length ◽

Cell Type ◽

Arabidopsis Root ◽

Rna Profiling ◽

Long Read ◽

Single Nucleus

AbstractThe broad application of large-scale single-cell RNA profiling in plants has been restricted by the prerequisite of protoplasting. We recently found that the Arabidopsis nucleus contains abundant polyadenylated mRNAs, many of which are incompletely spliced. To capture the isoform information, we combined 10x Genomics and Nanopore long-read sequencing to develop a protoplasting-free full-length single-nucleus RNA profiling method in plants. Our results demonstrated using Arabidopsis root that nuclear mRNAs faithfully retain cell identity information, and single-molecule full-length RNA sequencing could further improve cell type identification by revealing splicing status and alternative polyadenylation at single-cell level.

Download Full-text

Long-read sequencing of Chrysanthemum morifolium cv. ‘Hangju’ transcriptome reveals flavonoid biosynthesis and regulation

10.21203/rs.2.19942/v1 ◽

2020 ◽

Author(s):

Tao Wang ◽

Feng Yang ◽

Qiaosheng Guo ◽

Qingjun Zou ◽

Wenyan Zhang ◽

...

Keyword(s):

Single Molecule ◽

Developmental Stages ◽

Chrysanthemum Morifolium ◽

Flavonoid Biosynthesis ◽

Full Length ◽

Bioactive Components ◽

Smrt Sequencing ◽

Major Genes ◽

Long Read ◽

Gene Expression Levels

Abstract Background: The inflorescence of Chrysanthemum morifolium cv. ‘Hangju’ has been widely used in China due to its antioxidant and anti-inflammatory properties. The biosynthesis and regulation of flavonoids, a group of bioactive components, in C. morifolium are poorly understood. Transcriptome sequencing is an effective method for obtaining transcript information. Therefore, single-molecule real-time (SMRT) sequencing was performed to obtain the full-length genes involved in flavonoid biosynthesis and regulation in C. morifolium.Results: High-quality RNA was extracted from the inflorescence of C. morifolium at different developmental stages and used to construct two libraries (0-5 kb and 4.5-10 kb) for sequencing. Finally, 125,532 non-redundant isoforms with a mean length of 2,009 bp were obtained. Of these, 2,083 transcripts were annotated to pathways related to flavonoid biosynthesis, and 56 isoforms were annotated as CHS, CHI, F3H, F3’H, FNS Ⅱ, FLS, DFR and ANS genes. Based on gene expression levels at different stages, we predicted the major genes involved in flavonoid biosynthesis. By phylogenetic analysis, we found two candidate MYB transcription factors (CmMYBF1 and CmMYBF2) activating flavonol biosynthesis.Conclusions: Based on the full-length transcriptomic data and further quantitative analysis, the major genes involved in flavonoid biosynthesis and regulation in C. morifolium were predicted in our study. The results provide a valuable theoretical basis for the introduction and cultivation of C. morifolium cv. ‘Hangju’.

Download Full-text

Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1806447115 ◽

2018 ◽

Vol 115 (39) ◽

pp. 9726-9731 ◽

Cited By ~ 65

Author(s):

Roger Volden ◽

Theron Palmer ◽

Ashley Byrne ◽

Charles Cole ◽

Robert J. Schmitz ◽

...

Keyword(s):

Single Cell ◽

Full Length ◽

Long Distance ◽

Distance Information ◽

Short Read ◽

Transcript Isoforms ◽

Short Read Sequencing ◽

Sequencing Method ◽

Long Read ◽

Rna Transcript

High-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-distance information is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. While long-read sequencing technology can provide this long-distance information, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here, we introduce and validate a long-read ONT-based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single-cell samples.

Download Full-text