NAD tagSeq reveals that NAD+-capped RNAs are mostly produced from a large number of protein-coding genes in Arabidopsis

The 5′ end of a eukaryotic mRNA transcript generally has a 7-methylguanosine (m7G) cap that protects mRNA from degradation and mediates almost all other aspects of gene expression. Some RNAs in Escherichia coli, yeast, and mammals were recently found to contain an NAD+ cap. Here, we report the development of the method NAD tagSeq for transcriptome-wide identification and quantification of NAD+-capped RNAs (NAD-RNAs). The method uses an enzymatic reaction and then a click chemistry reaction to label NAD-RNAs with a synthetic RNA tag. The tagged RNA molecules can be enriched and directly sequenced using the Oxford Nanopore sequencing technology. NAD tagSeq can allow more accurate identification and quantification of NAD-RNAs, as well as reveal the sequences of whole NAD-RNA transcripts using single-molecule RNA sequencing. Using NAD tagSeq, we found that NAD-RNAs in Arabidopsis were produced by at least several thousand genes, most of which are protein-coding genes, with the majority of these transcripts coming from <200 genes. For some Arabidopsis genes, over 5% of their transcripts were NAD capped. Gene ontology terms overrepresented in the 2,000 genes that produced the highest numbers of NAD-RNAs are related to photosynthesis, protein synthesis, and responses to cytokinin and stresses. The NAD-RNAs in Arabidopsis generally have the same overall sequence structures as the canonical m7G-capped mRNAs, although most of them appear to have a shorter 5′ untranslated region (5′ UTR). The identification and quantification of NAD-RNAs and revelation of their sequence features can provide essential steps toward understanding the functions of NAD-RNAs.

Download Full-text

CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition

Nucleic Acids Research ◽

10.1093/nar/gkz400 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W516-W522 ◽

Cited By ~ 18

Author(s):

Jin-Cheng Guo ◽

Shuang-Sang Fang ◽

Yang Wu ◽

Jian-Hua Zhang ◽

Yang Chen ◽

...

Keyword(s):

Animal Species ◽

Fruit Fly ◽

Web Tool ◽

Accurate Identification ◽

Protein Coding ◽

Sequence Composition ◽

Rna Transcripts ◽

Almost All ◽

Coding Potential ◽

Generation Sequencing

Abstract As more and more high-throughput data has been produced by next-generation sequencing, it is still a challenge to classify RNA transcripts into protein-coding or non-coding, especially for poorly annotated species. We upgraded our original coding potential calculator, CNCI (Coding-Non-Coding Index), to CNIT (Coding-Non-Coding Identifying Tool), which provides faster and more accurate evaluation of the coding ability of RNA transcripts. CNIT runs ∼200 times faster than CNCI and exhibits more accuracy compared with CNCI (0.98 versus 0.94 for human, 0.95 versus 0.93 for mouse, 0.93 versus 0.92 for zebrafish, 0.93 versus 0.92 for fruit fly, 0.92 versus 0.88 for worm, and 0.98 versus 0.85 for Arabidopsis transcripts). Moreover, the AUC values of 11 animal species and 27 plant species showed that CNIT was capable of obtaining relatively accurate identification results for almost all eukaryotic transcripts. In addition, a mobile-friendly web server is now freely available at http://cnit.noncode.org/CNIT.

Download Full-text

Chromosomal assembly of the nuclear genome of the endosymbiont-bearing trypanosomatid Angomonas deanei

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkaa018 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

John W Davey ◽

Carolina M C Catta-Preta ◽

Sally James ◽

Sarah Forrester ◽

Maria Cristina M Motta ◽

...

Keyword(s):

Chromosome Number ◽

Noncoding Rnas ◽

Nuclear Genome ◽

Supernumerary Chromosome ◽

Ribosomal Rnas ◽

Protein Coding ◽

Transfer Rnas ◽

Protein Coding Genes ◽

Oxford Nanopore ◽

Genome Assemblies

Abstract Angomonas deanei is an endosymbiont-bearing trypanosomatid with several highly fragmented genome assemblies and unknown chromosome number. We present an assembly of the A. deanei nuclear genome based on Oxford Nanopore sequence that resolves into 29 complete or close-to-complete chromosomes. The assembly has several previously unknown special features; it has a supernumerary chromosome, a chromosome with a 340-kb inversion, and there is a translocation between two chromosomes. We also present an updated annotation of the chromosomal genome with 10,365 protein-coding genes, 59 transfer RNAs, 26 ribosomal RNAs, and 62 noncoding RNAs.

Download Full-text

Dual Isoform Sequencing Reveals a Multifaceted Transcriptional Architecture of a Prototype Baculovirus

10.21203/rs.3.rs-637036/v1 ◽

2021 ◽

Author(s):

Gábor Torma ◽

Dóra Tombácz ◽

Norbert Moldován ◽

Ádám Fülöp ◽

István Prazsák ◽

...

Keyword(s):

Protein Coding ◽

Rna Molecules ◽

Non Coding Rna ◽

Oxford Nanopore ◽

The Pacific ◽

Viral Genes ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Overlapping Transcripts

Abstract In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacific Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene.

Download Full-text

Complete Genome Sequence of Citricoccus sp. Strain SGAir0253, Isolated from Indoor Air in Singapore

Microbiology Resource Announcements ◽

10.1128/mra.00606-19 ◽

2019 ◽

Vol 8 (37) ◽

Author(s):

Lakshmi Chandrasekaran ◽

Daniela I. Drautz-Moses ◽

Akira Uchida ◽

Rikky W. Purbojati ◽

Anthony Wong ◽

...

Keyword(s):

Real Time ◽

Genome Sequence ◽

Single Molecule ◽

Indoor Air ◽

Complete Genome Sequence ◽

Complete Genome ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes

Citricoccus sp. strain SGAir0253 was isolated from indoor air collected in Singapore. Its genome sequence was assembled using single-molecule real-time sequencing. It comprises one chromosome of 3.32 Mb and two plasmids of 137 kb and 99 kb. The genome consists of 2,950 protein-coding genes, 49 tRNAs, and 9 rRNAs.

Download Full-text

Whole-Genome Sequence of Bacillus megaterium Strain SGAir0080, Isolated from an Indoor Air Sample

Microbiology Resource Announcements ◽

10.1128/mra.01249-19 ◽

2019 ◽

Vol 8 (50) ◽

Author(s):

Namrata Kalsi ◽

Akira Uchida ◽

Rikky W. Purbojati ◽

James N. I. Houghton ◽

Caroline Chénard ◽

...

Keyword(s):

Single Molecule ◽

Bacillus Megaterium ◽

Indoor Air ◽

Average Length ◽

Whole Genome Sequence ◽

Smrt Sequencing ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes ◽

Air Sample

Bacillus megaterium strain SGAir0080 was isolated from a tropical air sample in Singapore. Its genome was assembled using single-molecule real-time (SMRT) sequencing and MiSeq reads. It has one chromosome of 5.06 Mbp and seven plasmids (average length, 62.8 kbp). It possesses 5,339 protein-coding genes, 130 tRNAs, and 35 rRNAs.

Download Full-text

De Novo Whole-Genome Sequencing of the Wood Rot Fungus Polyporus brumalis, Which Exhibits Potential Terpenoid Metabolism

Genome Announcements ◽

10.1128/genomea.00586-17 ◽

2017 ◽

Vol 5 (28) ◽

Author(s):

Su-Yeon Lee ◽

Ji-eun An ◽

Sun-Hwa Ryu ◽

Myungkil Kim

Keyword(s):

Single Molecule ◽

De Novo ◽

Gene Annotation ◽

Draft Genome ◽

Fungal Growth ◽

Protein Coding ◽

Sequencing Platform ◽

Protein Coding Genes ◽

Polyporus Brumalis ◽

Terpenoid Metabolism

ABSTRACT Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis.

Download Full-text

Complete Genome Sequence of Brevundimonas sp. Strain SGAir0440, Isolated from Indoor Air in Singapore

Microbiology Resource Announcements ◽

10.1128/mra.00594-19 ◽

2019 ◽

Vol 8 (31) ◽

Author(s):

Rikky W. Purbojati ◽

Daniela I. Drautz-Moses ◽

Akira Uchida ◽

Anthony Wong ◽

Megan E. Clare ◽

...

Keyword(s):

Single Molecule ◽

Indoor Air ◽

Complete Genome Sequence ◽

Complete Genome ◽

Sequencing Data ◽

Circular Chromosome ◽

Protein Coding ◽

Content Type ◽

Air Samples ◽

Protein Coding Genes

Brevundimonas sp. strain SGAir0440 was isolated from indoor air samples collected in Singapore. Its genome was assembled using single-molecule real-time sequencing data, resulting in one circular chromosome with a length of 3.1 Mbp. The genome consists of 3,033 protein-coding genes, 48 tRNAs, and 6 rRNA operons.

Download Full-text

Multiple Long-read Sequencing Survey of Herpes Simplex Virus Lytic Transcriptome

10.1101/605956 ◽

2019 ◽

Author(s):

Dóra Tombácz ◽

Zsolt Balázs ◽

Gábor Gulyás ◽

Zsolt Csabai ◽

Miklós Boldogkoi ◽

...

Keyword(s):

Herpes Simplex Virus ◽

Herpes Simplex ◽

Single Molecule ◽

Rna Molecules ◽

Oxford Nanopore ◽

Dna Strands ◽

Long Read ◽

Simplex Virus ◽

Transcriptional Start Sites ◽

Hsv 1

ABSTRACTLong-read sequencing (LRS) has become increasingly important in RNA research due to its strength in resolving complex transcriptomic architectures. In this regard, currently two LRS platforms have demonstrated adequate performance: the Single Molecule Real-Time Sequencing by Pacific Biosciences (PacBio) and the nanopore sequencing by Oxford Nanopore Technologies (ONT). Even though these techniques produce lower coverage and are more error prone than short-read sequencing, they continue to be more successful in identifying transcript isoforms including polycistronic and multi-spliced RNA molecules, as well as transcript overlaps. Recent reports have successfully applied LRS for the investigation of the transcriptome of viruses belonging to various families. These studies have substantially increased the number of previously known viral RNA molecules. In this work, we used the Sequel and MinION technique from PacBio and ONT, respectively, to characterize the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). In most samples, we analyzed the poly(A) fraction of the transcriptome, but we also performed random oligonucleotide-based sequencing. Besides cDNA sequencing, we also carried out native RNA sequencing. Our investigations identified more than 160 previously undetected transcripts, including coding and non-coding RNAs, multi-splice transcripts, as well as polycistronic and complex transcripts. Furthermore, we determined previously unsubstantiated transcriptional start sites, polyadenylation sites, and splice sites. A large number of novel transcriptional overlaps were also detected. Random-primed sequencing revealed that each convergent gene pair produces non-polyadenylated read-through RNAs overlapping the partner genes. Furthermore, we identified novel replication-associated transcripts overlapping the HSV-1 replication origins, and novel LAT variants with very long 5’ regions, which are co-terminal with the LAT-0.7kb transcript. Overall, our results demonstrated that the HSV-1 transcripts form an extremely complex pattern of overlaps, and that entire viral genome is transcriptionally active. In most viral genes, if not in all, both DNA strands are expressed.

Download Full-text

Parallel and scalable workflow for the analysis of Oxford Nanopore direct RNA sequencing datasets

10.1101/818336 ◽

2019 ◽

Author(s):

Luca Cozzuto ◽

Huanle Liu ◽

Leszek P. Pryszcz ◽

Toni Hermoso Pulido ◽

Julia Ponomarenko ◽

...

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Tail Length ◽

Rna Modification ◽

Sequencing Data ◽

Polya Tail ◽

Sequencing Platform ◽

Rna Molecules ◽

Oxford Nanopore ◽

Quality Filtering

ABSTRACTThe direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed MasterOfPores. The pipeline converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, quality-filtering, base-calling and mapping. The output of the pipeline can in turn be used to compute per-gene counts, RNA modifications, and prediction of polyA tail length and RNA isoforms. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The MasterOfPores workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow will significantly simplify the analysis of nanopore direct RNA sequencing data by non-bioinformatics experts, thus boosting the understanding of the (epi)transcriptome with single molecule resolution.

Download Full-text

Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing

10.1101/310458 ◽

2018 ◽

Cited By ~ 9

Author(s):

Ruibang Luo ◽

Fritz J. Sedlazeck ◽

Tak-Wah Lam ◽

Michael C. Schatz

Keyword(s):

Neural Network ◽

Single Molecule ◽

Variant Calling ◽

Accurate Identification ◽

Whole Genome Analysis ◽

Single Molecule Sequencing ◽

Oxford Nanopore ◽

Indel Length ◽

Human Sample ◽

Dna Sequence Variants

AbstractThe accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5%-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than two hours on a standard server. Furthermore, we identified 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.

Download Full-text