High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing

Mapping Intimacies ◽

10.1101/105064 ◽

2017 ◽

Cited By ~ 4

Author(s):

Julien Lagarde ◽

Barbara Uszczynska-Ratajczak ◽

Silvia Carbonell ◽

SÍlvia Pérez-Lluch ◽

Amaya Abad ◽

...

Keyword(s):

High Throughput ◽

Single Molecule ◽

Noncoding Rnas ◽

Splice Junction ◽

Long Noncoding Rnas ◽

Full Length ◽

Novel Transcript ◽

Mouse Tissues ◽

Long Read ◽

Full Length Transcript

AbstractAccurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete: many gene models are fragmentary, while thousands more remain uncatalogued–particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third-generation long-read sequencing. We present an experimental re-annotation of the GENCODE intergenic lncRNA population in matched human and mouse tissues, resulting in novel transcript models for 3574 / 561 gene loci, respectively. CLS approximately doubles the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.Abbreviationsbpbase pairFLfull lengthntnucleotideROIread of insert, i.e. PacBio readSJsplice junctionSMRTsingle-molecule real-timeTMtranscript model

Faculty Opinions recommendation of High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732079252.793540264 ◽

2017 ◽

Author(s):

Hirotomo Saitsu

Keyword(s):

High Throughput ◽

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Full Length ◽

Long Read

High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing

Nature Genetics ◽

10.1038/ng.3988 ◽

2017 ◽

Vol 49 (12) ◽

pp. 1731-1740 ◽

Cited By ~ 109

Author(s):

Julien Lagarde ◽

Barbara Uszczynska-Ratajczak ◽

Silvia Carbonell ◽

Sílvia Pérez-Lluch ◽

Amaya Abad ◽

...

Keyword(s):

High Throughput ◽

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Full Length ◽

Long Read

Single-Molecule Real-Time Sequencing of the Madhuca pasquieri (Dubard) Lam. Transcriptome Reveals the Diversity of Full-Length Transcripts

Forests ◽

10.3390/f11080866 ◽

2020 ◽

Vol 11 (8) ◽

pp. 866

Author(s):

Lei Kan ◽

Qicong Liao ◽

Zhiyao Su ◽

Yushan Tan ◽

Shuyu Wang ◽

...

Keyword(s):

Seed Germination ◽

Single Molecule ◽

Developmental Stages ◽

De Novo ◽

Full Length ◽

Wild Plant ◽

Transcript Isoforms ◽

Long Read ◽

Full Length Transcript ◽

Generation Sequencing

Madhuca pasquieri (Dubard) Lam. is a tree on the International Union for Conservation of Nature Red List and a national key protected wild plant (II) of China, known for its seed oil and timber. However, lacking of genomic and transcriptome data for this species hampers study of its reproduction, utilization, and conservation. Here, single-molecule long-read sequencing (PacBio) and next-generation sequencing (Illumina) were combined to obtain the transcriptome from five developmental stages of M. pasquieri. Overall, 25,339 transcript isoforms were detected by PacBio, including 24,492 coding sequences (CDSs), 9440 simple sequence repeats (SSRs), 149 long non-coding RNAs (lncRNAs), and 182 alternative splicing (AS) events, a majority was retained intron (RI). A further 1058 transcripts were identified as transcriptional factors (TFs) from 51 TF families. PacBio recovered more full-length transcript isoforms with a longer length, and a higher expression level, whereas larger number of transcripts (124,405) was captured in de novo from Illumina. Using Nr, Swissprot, KOG, and KEGG databases, 24,405 transcripts (96.31%) were annotated by PacBio. Functional annotation revealed a role for the auxin, abscisic acid, gibberellin, and cytokinine metabolic pathways in seed germination and post-germination. These findings support further studies on seed germination mechanism and genome of M. pasquieri, and better protection of this endangered species.

Annotation of Full-Length Long Noncoding RNAs with Capture Long-Read Sequencing (CLS)

Methods in Molecular Biology - Functional Analysis of Long Non-Coding RNAs ◽

10.1007/978-1-0716-1158-6_9 ◽

2020 ◽

pp. 133-159

Author(s):

Sílvia Carbonell Sala ◽

Barbara Uszczyńska-Ratajczak ◽

Julien Lagarde ◽

Rory Johnson ◽

Roderic Guigó

Keyword(s):

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Full Length ◽

Long Read

ISOdb: A Comprehensive Database of Full-Length Isoforms Generated by Iso-Seq

International Journal of Genomics ◽

10.1155/2018/9207637 ◽

2018 ◽

Vol 2018 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Shang-Qian Xie ◽

Yue Han ◽

Xiao-Zhou Chen ◽

Tai-Yu Cao ◽

Kai-Kai Ji ◽

...

Keyword(s):

Single Molecule ◽

Full Length ◽

Public Access ◽

Transcript Isoforms ◽

Sequencing Technologies ◽

Long Reads ◽

Depth Analysis ◽

Gene Level ◽

Long Read ◽

Full Length Transcript

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.

PacBio Single-Molecule Long-Read Sequencing Reveals Genes Tolerating Manganese Stress in Schima superba Saplings

Frontiers in Genetics ◽

10.3389/fgene.2021.635043 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fiza Liaquat ◽

Muhammad Farooq Hussain Munis ◽

Samiah Arif ◽

Urooj Haroon ◽

Jianxin Shi ◽

...

Keyword(s):

Single Molecule ◽

Gene Annotation ◽

Treated Group ◽

Full Length ◽

Open Reading Frames ◽

Interacting Protein ◽

Schima Superba ◽

Long Read ◽

First Time ◽

Potential Tool

Schima superba (Theaceae) is a subtropical evergreen tree and is used widely for forest firebreaks and gardening. It is a plant that tolerates salt and typically accumulates elevated amounts of manganese in the leaves. With large ecological amplitude, this tree species grows quickly. Due to its substantial biomass, it has a great potential for soil remediation. To evaluate the thorough framework of the mRNA, we employed PacBio sequencing technology for the first time to generate S. Superba transcriptome. In this analysis, overall, 511,759 full length non-chimeric reads were acquired, and 163,834 high-quality full-length reads were obtained. Overall, 93,362 open reading frames were obtained, of which 78,255 were complete. In gene annotation analyses, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Genes (COG), Gene Ontology (GO), and Non-Redundant (Nr) databases were allocated 91,082, 71,839, 38,914, and 38,376 transcripts, respectively. To identify long non-coding RNAs (lncRNAs), we utilized four computational methods associated with protein families (Pfam), Cooperative Data Classification (CPC), Coding Assessing Potential Tool (CPAT), and Coding Non-Coding Index (CNCI) databases and observed 8,551, 9,174, 20,720, and 18,669 lncRNAs, respectively. Moreover, nine genes were randomly selected for the expression analysis, which showed the highest expression of Gene 6 (Na_Ca_ex gene), and CAX (CAX-interacting protein 4) was higher in manganese (Mn)-treated group. This work provided significant number of full-length transcripts and refined the annotation of the reference genome, which will ease advanced genetic analyses of S. superba.

A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq

PeerJ ◽

10.7717/peerj.2492 ◽

2016 ◽

Vol 4 ◽

pp. e2492 ◽

Cited By ~ 29

Author(s):

Catherine M. Burke ◽

Aaron E. Darling

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

Single Molecule ◽

Illumina Miseq ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Bacterial Taxonomy

BackgroundThe bacterial 16S rRNA gene has historically been used in defining bacterial taxonomy and phylogeny. However, there are currently no high-throughput methods to sequence full-length 16S rRNA genes present in a sample with precision.ResultsWe describe a method for sequencing near full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform and test it using DNA from human skin swab samples. Proof of principle of the approach is demonstrated, with the generation of 1,604 sequences greater than 1,300 nt from a single Nano MiSeq run, with accuracy estimated to be 100-fold higher than standard Illumina reads. The reads were chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection.ConclusionsThis method could be scaled up to generate many thousands of sequences per MiSeq run and could be applied to other sequencing platforms. This has great potential for populating databases with high quality, near full-length 16S rRNA gene sequences from under-represented taxa and environments and facilitates analyses of microbial communities at higher resolution.

Characterization of Full-Length Transcriptome Sequences and Splice Variants of Lateolabrax maculatus by Single-Molecule Long-Read Sequencing and Their Involvement in Salinity Regulation

Frontiers in Genetics ◽

10.3389/fgene.2019.01126 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 4

Author(s):

Yuan Tian ◽

Haishen Wen ◽

Xin Qi ◽

Xiaoyan Zhang ◽

Shikai Liu ◽

...

Keyword(s):

Single Molecule ◽

Splice Variants ◽

Full Length ◽

Long Read ◽

Lateolabrax Maculatus ◽

Transcriptome Sequences

The small peptide world in long noncoding RNAs

Briefings in Bioinformatics ◽

10.1093/bib/bby055 ◽

2019 ◽

Vol 20 (5) ◽

pp. 1853-1864 ◽

Cited By ~ 29

Author(s):

Seo-Won Choi ◽

Hyun-Woo Kim ◽

Jin-Wu Nam

Keyword(s):

High Throughput ◽

Functional Significance ◽

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Small Peptides ◽

Small Peptide ◽

Plant Genomes ◽

The Past ◽

Sequencing Technologies ◽

Coding Potential

Abstract Long noncoding RNAs (lncRNAs) are a group of transcripts that are longer than 200 nucleotides (nt) without coding potential. Over the past decade, tens of thousands of novel lncRNAs have been annotated in animal and plant genomes because of advanced high-throughput RNA sequencing technologies and with the aid of coding transcript classifiers. Further, a considerable number of reports have revealed the existence of stable, functional small peptides (also known as micropeptides), translated from lncRNAs. In this review, we discuss the methods of lncRNA classification, the investigations regarding their coding potential and the functional significance of the peptides they encode.

Single-molecule, full-length transcript sequencing provides insight into the extreme metabolism of the ruby-throated hummingbird Archilochus colubris

GigaScience ◽

10.1093/gigascience/giy009 ◽

2018 ◽

Vol 7 (3) ◽

Cited By ~ 26

Author(s):

Rachael E Workman ◽

Alexander M Myrka ◽

G William Wong ◽

Elizabeth Tseng ◽

Kenneth C Welch ◽

...

Keyword(s):

Single Molecule ◽

Full Length ◽

Full Length Transcript ◽

Insight Into