Nanopore ReCappable Sequencing maps SARS-CoV-2 5' capping sites and provides new insights into the structure of sgRNAs

The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5′ cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.

Download Full-text

ISOdb: A Comprehensive Database of Full-Length Isoforms Generated by Iso-Seq

International Journal of Genomics ◽

10.1155/2018/9207637 ◽

2018 ◽

Vol 2018 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Shang-Qian Xie ◽

Yue Han ◽

Xiao-Zhou Chen ◽

Tai-Yu Cao ◽

Kai-Kai Ji ◽

...

Keyword(s):

Single Molecule ◽

Full Length ◽

Public Access ◽

Transcript Isoforms ◽

Sequencing Technologies ◽

Long Reads ◽

Depth Analysis ◽

Gene Level ◽

Long Read ◽

Full Length Transcript

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.

Download Full-text

metaFlye: scalable long-read metagenome assembly using repeat graphs

10.1101/637637 ◽

2019 ◽

Cited By ~ 9

Author(s):

Mikhail Kolmogorov ◽

Mikhail Rayko ◽

Jeffrey Yuan ◽

Evgeny Polevikov ◽

Pavel Pevzner

Keyword(s):

Dark Matter ◽

State Of The Art ◽

Full Length ◽

Bacterial Genomes ◽

Short Read ◽

Sequencing Technologies ◽

16S Rna ◽

Long Read ◽

Metagenome Assembly ◽

Rna Genes

AbstractLong-read sequencing technologies substantially improved assemblies of many isolate bacterial genomes as compared to fragmented assemblies produced with short-read technologies. However, assembling complex metagenomic datasets remains a challenge even for the state-of-the-art long-read assemblers. To address this gap, we present the metaFlye assembler and demonstrate that it generates highly contiguous and accurate metagenome assemblies. In contrast to short-read metagenomics assemblers that typically fail to reconstruct full-length 16S RNA genes, metaFlye captures many 16S RNA genes within long contigs, thus providing new opportunities for analyzing the microbial “dark matter of life”. We also demonstrate that long-read metagenome assemblers significantly improve full-length plasmid and virus reconstruction as compared to short-read assemblers and reveal many novel plasmids and viruses.

Download Full-text

Accurate viral genome reconstruction and host assignment with proximity-ligation sequencing

10.1101/2021.06.14.448389 ◽

2021 ◽

Author(s):

Gherman Uritskiy ◽

Maximillian Press ◽

Christine Sun ◽

Guillermo Dominguez Huerta ◽

Ahmed A. Zayed ◽

...

Keyword(s):

Viral Genome ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Fecal Microbiome ◽

Genome Reconstruction ◽

Viral Genomes ◽

Proximity Ligation ◽

Sequencing Technologies ◽

Microbiome Research ◽

Long Read

Viruses play crucial roles in the ecology of microbial communities, yet they remain relatively understudied in their native environments. Despite many advancements in high-throughput whole-genome sequencing (WGS), sequence assembly, and annotation of viruses, the reconstruction of full-length viral genomes directly from metagenomic sequencing is possible only for the most abundant phages and requires long-read sequencing technologies. Additionally, the prediction of their cellular hosts remains difficult from conventional metagenomic sequencing alone. To address these gaps in the field and to accelerate the study of viruses directly in their native microbiomes, we developed an end-to-end bioinformatics platform for viral genome reconstruction and host attribution from metagenomic data using proximity-ligation sequencing (i.e., Hi-C). We demonstrate the capabilities of the platform by recovering and characterizing the metavirome of a variety of metagenomes, including a fecal microbiome that has also been sequenced with accurate long reads, allowing for the assessment and benchmarking of the new methods. The platform can accurately extract numerous near-complete viral genomes even from highly fragmented short-read assemblies and can reliably predict their cellular hosts with minimal false positives. To our knowledge, this is the first software for performing these tasks. Being significantly cheaper than long-read sequencing of comparable depth, the incorporation of proximity-ligation sequencing in microbiome research shows promise to greatly accelerate future advancements in the field.

Download Full-text

Identification of Full-length Circular Nucleic Acids using Long-read Sequencing Technologies

The Analyst ◽

10.1039/d1an01147b ◽

2021 ◽

Author(s):

Wenxiang Lu ◽

Kequan Yu ◽

Xiaohan Li ◽

Qinyu Ge ◽

Geyu Liang ◽

...

Keyword(s):

Nucleic Acids ◽

Genomic Dna ◽

Full Length ◽

Sequencing Technologies ◽

Circular Configuration ◽

Long Read

Unlike the traditional perception in genomic DNA or linear RNA, circular nucleic acids are a class of functional biomolecules with a circular configuration and are often observed in nature. These...

Download Full-text

Taxonomic resolution of the ribosomal RNA operon in bacteria: Implications for its use with long read sequencing

10.1101/626093 ◽

2019 ◽

Author(s):

Leonardo de Oliveira Martins ◽

Andrew J. Page ◽

Ian G. Charles

Keyword(s):

Ribosomal Rna ◽

Sequence Variation ◽

Full Length ◽

Taxonomic Resolution ◽

Bacterial Cells ◽

Genus Level ◽

Sequencing Technologies ◽

Ribosomal Operon ◽

Long Read ◽

Multiple Copies

AbstractLong-read sequencing technologies enable capture of the full-length of ribosomal RNA operons in a single read. Bacterial cells usually have multiple copies of this ribosomal operon; sequence variation within a species of bacterium can exceed variation between species. For uncultured organisms this may affect the overall taxonomic resolution, to genus level, of the full-length ribosomal operon.

Download Full-text

Ultra-accurate microbial amplicon sequencing with synthetic long reads

Microbiome ◽

10.1186/s40168-021-01072-3 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Benjamin J. Callahan ◽

Dmitry Grinevich ◽

Siddhartha Thakur ◽

Michael A. Balamotis ◽

Tuval Ben Yehezkel

Keyword(s):

Microbial Community ◽

16S Rrna ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Strain Identification ◽

Long Reads ◽

Long Read

Abstract Background Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Methods Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads. Results LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens. Conclusions The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.

Download Full-text

FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants

Genome Biology ◽

10.1186/s13059-021-02288-0 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 2

Author(s):

Yanping Long ◽

Zhijian Liu ◽

Jinbu Jia ◽

Weipeng Mo ◽

Liang Fang ◽

...

Keyword(s):

Single Cell ◽

Cell Walls ◽

Large Scale ◽

Full Length ◽

Cell Level ◽

Root Cells ◽

Rna Profiling ◽

Different Types ◽

Long Read ◽

Single Nucleus

AbstractThe broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.

Download Full-text

Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing

The ISME Journal ◽

10.1038/s41396-021-01052-3 ◽

2021 ◽

Author(s):

Valentin Waschulin ◽

Chiara Borsetto ◽

Robert James ◽

Kevin K. Newsham ◽

Stefano Donadio ◽

...

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Full Length ◽

Metagenomic Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Rich Diversity ◽

Long Read ◽

The Rich

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.

Download Full-text

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Nature Communications ◽

10.1038/s41467-021-24041-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chong Chu ◽

Rebeca Borges-Monroy ◽

Vinayak V. Viswanadham ◽

Soohyun Lee ◽

Heng Li ◽

...

Keyword(s):

Transposable Element ◽

Structure And Function ◽

Endogenous Retroviruses ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

And Function

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.

Download Full-text

Survey of the Bradysia odoriphaga Transcriptome Using PacBio Single-Molecule Long-Read Sequencing

Genes ◽

10.3390/genes10060481 ◽

2019 ◽

Vol 10 (6) ◽

pp. 481 ◽

Cited By ~ 1

Author(s):

Chen ◽

Lin ◽

Xie ◽

Zhong ◽

Zhang ◽

...

Keyword(s):

Insecticide Resistance ◽

Single Molecule ◽

Functional Categories ◽

Genetic Studies ◽

Sequencing Technologies ◽

Clusters Of Orthologous Groups ◽

Long Read ◽

Main Gene ◽

First Time ◽

Main Factor

The damage caused by Bradysia odoriphaga is the main factor threatening the production of vegetables in the Liliaceae family. However, few genetic studies of B. odoriphaga have been conducted because of a lack of genomic resources. Many long-read sequencing technologies have been developed in the last decade; therefore, in this study, the transcriptome including all development stages of B. odoriphaga was sequenced for the first time by Pacific single-molecule long-read sequencing. Here, 39,129 isoforms were generated, and 35,645 were found to have annotation results when checked against sequences available in different databases. Overall, 18,473 isoforms were distributed in 25 various Clusters of Orthologous Groups, and 11,880 isoforms were categorized into 60 functional groups that belonged to the three main Gene Ontology classifications. Moreover, 30,610 isoforms were assigned into 44 functional categories belonging to six main Kyoto Encyclopedia of Genes and Genomes functional categories. Coding DNA sequence (CDS) prediction showed that 36,419 out of 39,129 isoforms were predicted to have CDS, and 4319 simple sequence repeats were detected in total. Finally, 266 insecticide resistance and metabolism-related isoforms were identified as candidate genes for further investigation of insecticide resistance and metabolism in B. odoriphaga.

Download Full-text