New de novo assembly of the Atlantic bottlenose dolphin (Tursiops truncatus) improves genome completeness and provides haplotype phasing

AbstractHigh quality genomes are essential to resolve challenges in breeding, comparative biology, medicine and conservation planning. New library preparation techniques along with better assembly algorithms result in continued improvements in assemblies for non-model organisms, moving them toward reference quality genomes. We report on the latest genome assembly of the Atlantic bottlenose dolphin leveraging Illumina sequencing data coupled with a combination of several library preparation techniques. These include Linked-Reads (Chromium, 10x Genomics), mate pairs, long insert paired ends and standard paired ends. Data were assembled with the commercial DeNovoMAGICTM assembly software resulting in two assemblies, a traditional “haploid” assembly (Tur_tru_Illumina_hap_v1) that is a mosaic of the two parental haplotypes and a phased assembly (Tur_tru_Illumina_phased_v1) where each scaffold has sequence from a single homologous chromosome. We show that Tur_tru_Illumina_hap_v1 is more complete and accurate compared to the current best reference based on the amount and composition of sequence, the consistency of the mate pair alignments to the assembled scaffolds, and on the analysis of conserved single-copy mammalian orthologs. The phased de novo assembly Tur_tru_Illumina_phased_v1 is the first publicly available for this species and provides the community with novel and accurate ways to explore the heterozygous nature of the dolphin genome.

Download Full-text

New de novo assembly of the Atlantic bottlenose dolphin (Tursiops truncatus) improves genome completeness and provides haplotype phasing

GigaScience ◽

10.1093/gigascience/giy168 ◽

2019 ◽

Vol 8 (3) ◽

Cited By ~ 2

Author(s):

Karine A Martinez-Viaud ◽

Cindy Taylor Lawley ◽

Milmer Martinez Vergara ◽

Gil Ben-Zvi ◽

Tammy Biniashvili ◽

...

Keyword(s):

De Novo Assembly ◽

Bottlenose Dolphin ◽

Tursiops Truncatus ◽

De Novo ◽

Haplotype Phasing ◽

Atlantic Bottlenose Dolphin

Download Full-text

Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1927-y ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Kosai Al-Nakeeb ◽

Thomas Nordahl Petersen ◽

Thomas Sicheritz-Pontén

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

International Journal of Genomics ◽

10.1155/2014/434575 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8

Author(s):

Momchilo Vuyisich ◽

Ayesha Arefin ◽

Karen Davenport ◽

Shihai Feng ◽

Cheryl Gleasner ◽

...

Keyword(s):

Genomic Dna ◽

De Novo ◽

Gc Content ◽

Library Preparation ◽

Sequencing Data ◽

Bacterial Genomes ◽

Dna Amount ◽

High Quality ◽

Preparation Methods

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.

Download Full-text

AStrap: identification of alternative splicing from transcript sequences without a reference genome

Bioinformatics ◽

10.1093/bioinformatics/bty1008 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2654-2656 ◽

Cited By ~ 5

Author(s):

Guoli Ji ◽

Wenbin Ye ◽

Yaru Su ◽

Moliang Chen ◽

Guangzao Huang ◽

...

Keyword(s):

Machine Learning ◽

Alternative Splicing ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Supplementary Information ◽

Model Organisms ◽

Sequencing Data ◽

Extensive Evaluation ◽

Reference Genomes

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HiC-Hiker: a probabilistic model to determine contig orientation in chromosome-length scaffolds with Hi-C

Bioinformatics ◽

10.1093/bioinformatics/btaa288 ◽

2020 ◽

Vol 36 (13) ◽

pp. 3966-3974

Author(s):

Ryo Nakabayashi ◽

Shinichi Morishita

Keyword(s):

Viterbi Algorithm ◽

De Novo ◽

Gene Prediction ◽

Effective Means ◽

Cost Effective ◽

Synteny Block ◽

Chromosome Length ◽

Model Organisms ◽

Contact Frequency ◽

Reference Quality

Abstract Motivation De novo assembly of reference-quality genomes used to require enormously laborious tasks. In particular, it is extremely time-consuming to build genome markers for ordering assembled contigs along chromosomes; thus, they are only available for well-established model organisms. To resolve this issue, recent studies demonstrated that Hi-C could be a powerful and cost-effective means to output chromosome-length scaffolds for non-model species with no genome marker resources, because the Hi-C contact frequency between a pair of two loci can be a good estimator of their genomic distance, even if there is a large gap between them. Indeed, state-of-the-art methods such as 3D-DNA are now widely used for locating contigs in chromosomes. However, it remains challenging to reduce errors in contig orientation because shorter contigs have fewer contacts with their neighboring contigs. These orientation errors lower the accuracy of gene prediction, read alignment, and synteny block estimation in comparative genomics. Results To reduce these contig orientation errors, we propose a new algorithm, named HiC-Hiker, which has a firm grounding in probabilistic theory, rigorously models Hi-C contacts across contigs, and effectively infers the most probable orientations via the Viterbi algorithm. We compared HiC-Hiker and 3D-DNA using human and worm genome contigs generated from short reads, evaluated their performances, and observed a remarkable reduction in the contig orientation error rate from 4.3% (3D-DNA) to 1.7% (HiC-Hiker). Our algorithm can consider long-range information between distal contigs and precisely estimates Hi-C read contact probabilities among contigs, which may also be useful for determining the ordering of contigs. Availability and implementation HiC-Hiker is freely available at: https://github.com/ryought/hic_hiker.

Download Full-text

De novo assembly of ultra-deep sequencing data

Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14 ◽

10.1145/2649387.2660799 ◽

2014 ◽

Author(s):

Hamid Mirebrahim ◽

Timothy Close ◽

Stefano Lonardi

Keyword(s):

Deep Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Sequencing Data ◽

Deep Sequencing Data

Download Full-text

De Novo Sequencing and Hybrid Assembly of the Biofuel Crop Jatropha curcas L.: Identification of Quantitative Trait Loci for Geminivirus Resistance

Genes ◽

10.3390/genes10010069 ◽

2019 ◽

Vol 10 (1) ◽

pp. 69 ◽

Cited By ~ 9

Author(s):

Nagesh Kancharla ◽

Saakshi Jalali ◽

J. Narasimham ◽

Vinod Nair ◽

Vijay Yepuri ◽

...

Keyword(s):

Ssr Markers ◽

Genome Assembly ◽

Jatropha Curcas ◽

Quantitative Trait ◽

De Novo ◽

Mapping Population ◽

Single Copy ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Sequencing Technologies

Jatropha curcas is an important perennial, drought tolerant plant that has been identified as a potential biodiesel crop. We report here the hybrid de novo genome assembly of J. curcas generated using Illumina and PacBio sequencing technologies, and identification of quantitative loci for Jatropha Mosaic Virus (JMV) resistance. In this study, we generated scaffolds of 265.7 Mbp in length, which correspond to 84.8% of the gene space, using Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis. Additionally, 96.4% of predicted protein-coding genes were captured in RNA sequencing data, which reconfirms the accuracy of the assembled genome. The genome was utilized to identify 12,103 dinucleotide simple sequence repeat (SSR) markers, which were exploited in genetic diversity analysis to identify genetically distinct lines. A total of 207 polymorphic SSR markers were employed to construct a genetic linkage map for JMV resistance, using an interspecific F2 mapping population involving susceptible J. curcas and resistant Jatropha integerrima as parents. Quantitative trait locus (QTL) analysis led to the identification of three minor QTLs for JMV resistance, and the same has been validated in an alternate F2 mapping population. These validated QTLs were utilized in marker-assisted breeding for JMV resistance. Comparative genomics of oil-producing genes across selected oil producing species revealed 27 conserved genes and 2986 orthologous protein clusters in Jatropha. This reference genome assembly gives an insight into the understanding of the complex genetic structure of Jatropha, and serves as source for the development of agronomically improved virus-resistant and oil-producing lines.

Download Full-text

State of the art de novo assembly of human genomes from massively parallel sequencing data

Human Genomics ◽

10.1186/1479-7364-4-4-271 ◽

2010 ◽

Vol 4 (4) ◽

pp. 271 ◽

Cited By ~ 49

Author(s):

Yingrui Li ◽

Yujie Hu ◽

Lars Bolund ◽

Jun Wang

Keyword(s):

De Novo Assembly ◽

De Novo ◽

State Of The Art ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Sequencing Data ◽

Parallel Sequencing ◽

Human Genomes

Download Full-text

When Less is More: "Slicing" Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality

10.1101/013425 ◽

2015 ◽

Cited By ~ 1

Author(s):

Stefano Lonardi ◽

Hamid Mirebrahim ◽

Steve Wanamaker ◽

Matthew Alpert ◽

Gianfranco Ciardo ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Optimal Size ◽

Sequencing Data ◽

Less Is More ◽

Bac Clones ◽

Deep Sequencing Data ◽

First Time

Since the invention of DNA sequencing in the seventies, computational biologists have had to deal with the problem de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, for the first time we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Specifically, we explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed by our group), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on "divide and conquer": we "slice" a large dataset into smaller samples of optimal size, decode each slice independently, then merge the results. Experimental results on over 15,000 barley BACs and over 4,000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

Download Full-text

A de novo assembly of the sweet cherry (Prunus avium cv. Tieton) genome using linked-read sequencing technology

PeerJ ◽

10.7717/peerj.9114 ◽

2020 ◽

Vol 8 ◽

pp. e9114 ◽

Cited By ~ 1

Author(s):

Jiawei Wang ◽

Weizhen Liu ◽

Dongzi Zhu ◽

Xiang Zhou ◽

Po Hong ◽

...

Keyword(s):

Sweet Cherry ◽

Prunus Avium ◽

Reference Genome ◽

De Novo ◽

Draft Genome ◽

Single Copy ◽

Sequencing Data ◽

Sequencing Technology ◽

High Quality ◽

Eukaryotic Genes

The sweet cherry (Prunus avium) is one of the most economically important fruit species in the world. However, there is a limited amount of genetic information available for this species, which hinders breeding efforts at a molecular level. We were able to describe a high-quality reference genome assembly and annotation of the diploid sweet cherry (2n = 2x = 16) cv. Tieton using linked-read sequencing technology. We generated over 750 million clean reads, representing 112.63 GB of raw sequencing data. The Supernova assembler produced a more highly-ordered and continuous genome sequence than the current P. avium draft genome, with a contig N50 of 63.65 KB and a scaffold N50 of 2.48 MB. The final scaffold assembly was 280.33 MB in length, representing 82.12% of the estimated Tieton genome. Eight chromosome-scale pseudomolecules were constructed, completing a 214 MB sequence of the final scaffold assembly. De novo, homology-based, and RNA-seq methods were used together to predict 30,975 protein-coding loci. 98.39% of core eukaryotic genes and 97.43% of single copy orthologues were identified in the embryo plant, indicating the completeness of the assembly. Linked-read sequencing technology was effective in constructing a high-quality reference genome of the sweet cherry, which will benefit the molecular breeding and cultivar identification in this species.

Download Full-text