Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

International Journal of Genomics ◽

10.1155/2014/434575 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8

Author(s):

Momchilo Vuyisich ◽

Ayesha Arefin ◽

Karen Davenport ◽

Shihai Feng ◽

Cheryl Gleasner ◽

...

Keyword(s):

Genomic Dna ◽

De Novo ◽

Gc Content ◽

Library Preparation ◽

Sequencing Data ◽

Bacterial Genomes ◽

Dna Amount ◽

High Quality ◽

Preparation Methods

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.

Download Full-text

TrancriptomeReconstructoR, A Data-Driven Annotation of Complex Transcriptomes

10.21203/rs.3.rs-131404/v1 ◽

2020 ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

TrancriptomeReconstructoR: data-driven annotation of complex transcriptomes

10.1101/2020.12.10.418897 ◽

2020 ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

AbstractBackgroundThe quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data.ResultsWe developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5’ and 3’ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.ConclusionsOur proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

TrancriptomeReconstructoR: data-driven annotation of complex transcriptomes

BMC Bioinformatics ◽

10.1186/s12859-021-04208-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

Abstract Background The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: (i) full-length RNA-seq for detection of splicing patterns and (ii) high-throughput 5′ and 3′ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts. We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings and Saccharomyces cerevisiae cells as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the most commonly used community gene models, TAIR10 and Araport11 for A.thaliana and SacCer3 for S.cerevisiae. In particular, we identify multiple transient transcripts missing from the existing annotations. Our new annotations promise to improve the quality of A.thaliana and S.cerevisiae genome research. Conclusions Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

Abstract 1664: Comparison of RNA sequencing data generated from formalin-fixed, paraffin-embedded (FFPE) papillary thyroid carcinoma samples using different library preparation methods

10.1158/1538-7445.am2019-1664 ◽

2019 ◽

Author(s):

Julie Dragon ◽

Ramiro Barrantes ◽

Jessica Hoffman ◽

Scott Tighe

Keyword(s):

Papillary Thyroid Carcinoma ◽

Thyroid Carcinoma ◽

Rna Sequencing ◽

Papillary Thyroid ◽

Library Preparation ◽

Sequencing Data ◽

Preparation Methods ◽

Formalin Fixed Paraffin ◽

Formalin Fixed Paraffin Embedded ◽

Formalin Fixed

Download Full-text

High-Quality Genome Assembly of Peronospora destructor, the Causal Agent of Onion Downy Mildew

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-10-19-0280-a ◽

2020 ◽

Vol 33 (5) ◽

pp. 718-720

Author(s):

Karthi Natesan ◽

Ji Yeon Park ◽

Cheol-Woo Kim ◽

Dong Suk Park ◽

Young-Seok Kwon ◽

...

Keyword(s):

Downy Mildew ◽

De Novo ◽

Gc Content ◽

Comparative Genomic ◽

High Quality ◽

Sequencing Platform ◽

Peronospora Destructor ◽

Genomic Studies ◽

Genome Assemblies ◽

High Quality Genome

Peronospora destructor is an obligate biotrophic oomycete that causes downy mildew on onion (Allium cepa). Onion is an important crop worldwide, but its production is affected by this pathogen. We sequenced the genome of P. destructor using the PacBio sequencing platform, and de novo assembly resulted in 74 contigs with a total contig size of 29.3 Mb and 48.48% GC content. Here, we report the first high-quality genome sequence of P. destructor and its comparison with the genome assemblies of other oomycetes. The genome is a very useful resource to serve as a reference for analysis of P. destructor isolates and for comparative genomic studies of the biotrophic oomycetes.

Download Full-text

When Less is More: "Slicing" Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality

10.1101/013425 ◽

2015 ◽

Cited By ~ 1

Author(s):

Stefano Lonardi ◽

Hamid Mirebrahim ◽

Steve Wanamaker ◽

Matthew Alpert ◽

Gianfranco Ciardo ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Optimal Size ◽

Sequencing Data ◽

Less Is More ◽

Bac Clones ◽

Deep Sequencing Data ◽

First Time

Since the invention of DNA sequencing in the seventies, computational biologists have had to deal with the problem de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, for the first time we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Specifically, we explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed by our group), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on "divide and conquer": we "slice" a large dataset into smaller samples of optimal size, decode each slice independently, then merge the results. Experimental results on over 15,000 barley BACs and over 4,000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

Download Full-text

High-Quality Genome Resource of the Pathogen of Botryosphaeria dothidea Causing Kiwifruit Soft Rot

PhytoFrontiers™ ◽

10.1094/phytofr-07-20-0006-a ◽

2021 ◽

pp. PHYTOFR-07-20-0

Author(s):

Kuan Liang ◽

Jianbin Lan ◽

Baoquan Wang ◽

Yuanyuan Liu ◽

Qi Lu ◽

...

Keyword(s):

De Novo ◽

Gc Content ◽

Soft Rot ◽

Read Length ◽

Comparative Genomic ◽

Secretory Proteins ◽

Botryosphaeria Dothidea ◽

High Quality ◽

Total Size ◽

High Quality Genome

Kiwifruit soft rot caused by the fungal pathogen Botryosphaeria dothidea is a serious disease in kiwifruit-growing regions worldwide. In this study, we reported the high-quality genome sequence of the highly virulent B. dothidea strain PTZ1 using PacBio Sequel techniques. In total, 100.87 million clean reads with mean read length of 9,871 bp were obtained. De novo assembly resulted in 28 contigs with a total size of 44.45 Mb. The GC content of the genome was 54.59%. Furthermore, genes related to specific virulence of the strain were identified, including 259 fungal cytochrome P450s, 550 carbohydrate-active enzymes, 860 secretory proteins, and 1,182 pathogen–host interactions related proteins. The genome is a useful resource to serve as a reference to facilitate the analysis of B. dothidea isolates and comparative genomic studies of the necrotroph pathogens. [Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license .

Download Full-text

A de novo assembly of the sweet cherry (Prunus avium cv. Tieton) genome using linked-read sequencing technology

PeerJ ◽

10.7717/peerj.9114 ◽

2020 ◽

Vol 8 ◽

pp. e9114 ◽

Cited By ~ 1

Author(s):

Jiawei Wang ◽

Weizhen Liu ◽

Dongzi Zhu ◽

Xiang Zhou ◽

Po Hong ◽

...

Keyword(s):

Sweet Cherry ◽

Prunus Avium ◽

Reference Genome ◽

De Novo ◽

Draft Genome ◽

Single Copy ◽

Sequencing Data ◽

Sequencing Technology ◽

High Quality ◽

Eukaryotic Genes

The sweet cherry (Prunus avium) is one of the most economically important fruit species in the world. However, there is a limited amount of genetic information available for this species, which hinders breeding efforts at a molecular level. We were able to describe a high-quality reference genome assembly and annotation of the diploid sweet cherry (2n = 2x = 16) cv. Tieton using linked-read sequencing technology. We generated over 750 million clean reads, representing 112.63 GB of raw sequencing data. The Supernova assembler produced a more highly-ordered and continuous genome sequence than the current P. avium draft genome, with a contig N50 of 63.65 KB and a scaffold N50 of 2.48 MB. The final scaffold assembly was 280.33 MB in length, representing 82.12% of the estimated Tieton genome. Eight chromosome-scale pseudomolecules were constructed, completing a 214 MB sequence of the final scaffold assembly. De novo, homology-based, and RNA-seq methods were used together to predict 30,975 protein-coding loci. 98.39% of core eukaryotic genes and 97.43% of single copy orthologues were identified in the embryo plant, indicating the completeness of the assembly. Linked-read sequencing technology was effective in constructing a high-quality reference genome of the sweet cherry, which will benefit the molecular breeding and cultivar identification in this species.

Download Full-text

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads

10.1101/030437 ◽

2015 ◽

Cited By ~ 3

Author(s):

Ivan Sovic ◽

Kresimir Krizanovic ◽

Karolj Skala ◽

Mile Sikic

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Hybrid Methods ◽

Bacterial Genome ◽

Error Rates ◽

Sequencing Data ◽

E Coli ◽

Recent Emergence ◽

K 12

Recent emergence of nanopore sequencing technology set a challenge for the established assembly methods not optimized for the combination of read lengths and high error rates of nanopore reads. In this work we assessed how existing de novo assembly methods perform on these reads. We benchmarked three non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of E. coli K-12, using several sequencing coverages of nanopore data (20x, 30x, 40x and 50x). We attempted to assess the quality of assembly at each of these coverages, to estimate the requirements for closed bacterial genome assembly. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. Furthermore, when coverage is above 40x, all non-hybrid methods correctly assemble the E. coli genome, even a non-hybrid method tailored for Pacific Bioscience reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.

Download Full-text

An exploration of assembly strategies and quality metrics on the accuracy of the Knightia excelsa (rewarewa) genome.

10.22541/au.161048558.86691399/v1 ◽

2021 ◽

Author(s):

Ann McCartney ◽

Elena Hilario ◽

Seung-Sub Choi ◽

Joseph Guhlin ◽

Jessie Prebble ◽

...

Keyword(s):

New Zealand ◽

De Novo ◽

Quality Metrics ◽

Read Length ◽

Model Organisms ◽

Sequencing Data ◽

Contig Assembly ◽

High Quality ◽

Aotearoa New Zealand ◽

Long Read

We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.

Download Full-text