De novo whole-genome assembly of Chrysanthemum makinoi, a key wild chrysanthemum

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab358 ◽

2021 ◽

Author(s):

Natascha van Lieshout ◽

Martijn van Kaauwen ◽

Linda Kodde ◽

Paul Arens ◽

Marinus J M Smulders ◽

...

Keyword(s):

Ab Initio ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Its Sequence ◽

Whole Genome ◽

Annotation Pipeline ◽

Long Reads ◽

Oxford Nanopore ◽

The World

Abstract Chrysanthemum is among the top ten cut, potted and perennial garden flowers in the world. Despite this, to date, only the genomes of two wild diploid chrysanthemums have been sequenced and assembled. Here we present the most complete and contiguous chrysanthemum de novo assembly published so far, as well as a corresponding ab initio annotation. The cultivated hexaploid varieties are thought to originate from a hybrid of wild chrysanthemums, among which the diploid Chrysanthemum makinoi has been mentioned. Using a combination of Oxford Nanopore long reads, Pacific Biosciences long reads, Illumina short reads, Dovetail sequences and a genetic map, we assembled 3.1 Gb of its sequence into 9 pseudochromosomes, with an N50 of 330 Mb and BUSCO complete score of 92.1%. Our ab initio annotation pipeline predicted 95 074 genes and marked 80.0% of the genome as repetitive. This genome assembly of C. makinoi provides an important step forward in understanding the chrysanthemum genome, evolution and history.

Download Full-text

De novo whole-genome assembly of Chrysanthemum makinoi, a key wild ancestor to hexaploid Chrysanthemum

10.1101/2021.07.09.451814 ◽

2021 ◽

Author(s):

Natascha van Lieshout ◽

Martijn van Kaauwen ◽

Linda Kodde ◽

Paul Arens ◽

Marinus J. M. Smulders ◽

...

Keyword(s):

Ab Initio ◽

Genome Evolution ◽

Genome Assembly ◽

De Novo ◽

Its Sequence ◽

Whole Genome ◽

Annotation Pipeline ◽

Long Reads ◽

Oxford Nanopore ◽

The World

Chrysanthemum is among the top ten cut, potted and perennial garden flowers in the world. Despite this, to date, only the genomes of two wild diploid chrysanthemums have been sequenced and assembled. Here we present the most complete and contiguous chrysanthemum de novo assembly published so far, as well as a corresponding ab initio annotation. The wild diploid Chrysanthemum makinoi is thought to be one of the ancestors of the cultivated hexaploid varieties which are currently grown all around the world. Using a combination of Oxford Nanopore long reads, Pacific Biosciences long reads, Illumina short reads, Dovetail sequences and a genetic map, we assembled 3.1 Gb of its sequence into 9 pseudochromosomes, with an N50 of 330 Mb and BUSCO complete score of 92.1%. Our ab initio annotation pipeline predicted 95 074 genes and marked 80.0% of the genome as repetitive. This genome assembly of C. makinoi provides an important step forward in understanding the chrysanthemum genome, evolution and history.

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab052 ◽

2021 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C Waldbieser ◽

Ramey C Youngblood ◽

Paul A Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Structural Variations ◽

High Coverage ◽

Haploid Chromosome Number ◽

Long Reads

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

Download Full-text

Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats

10.1101/300186 ◽

2018 ◽

Cited By ~ 3

Author(s):

Michael Schmid ◽

Daniel Frei ◽

Andrea Patrignani ◽

Ralph Schlapbach ◽

Jürg E. Frey ◽

...

Keyword(s):

Dark Matter ◽

Genome Assembly ◽

De Novo ◽

Bacterial Genomes ◽

De Novo Genome Assembly ◽

Assembly Algorithm ◽

Long Reads ◽

Oxford Nanopore ◽

Prokaryotic Genomes ◽

Genome Assemblies

AbstractGenerating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length. Beyond long repeats, the P19E3 assembly was further complicated by a shufflon region. Its complex genome could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from the Oxford Nanopore Technology. Another important factor for a full genomic resolution was the choice of assembly algorithm.Importantly, a repeat analysis indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this dark matter for de novo genome assembly of prokaryotes. Several of these dark matter genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assemblers capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.

Download Full-text

Using long and linked reads to improve an Atlantic herring (Clupea harengus) genome assembly

Scientific Reports ◽

10.1038/s41598-019-54151-9 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Sunnvør í Kongsstovu ◽

Svein-Ole Mikalsen ◽

Eydna í Homrum ◽

Jan Arge Jacobsen ◽

Paul Flicek ◽

...

Keyword(s):

North Atlantic ◽

Genome Assembly ◽

De Novo ◽

Clupea Harengus ◽

Atlantic Herring ◽

The North ◽

Long Reads ◽

Mate Pair ◽

The World ◽

Crucial Part

AbstractAtlantic herring (Clupea harengus) is one of the most abundant fish species in the world. It is an important economical and nutritional resource, as well as a crucial part of the North Atlantic ecosystem. In 2016, a draft herring genome assembly was published. Being a species of such importance, we sought to independently verify and potentially improve the herring genome assembly. We sequenced the herring genome generating paired-end, mate-pair, linked and long reads. Three assembly versions of the herring genome were generated based on a de novo assembly (A1), which was scaffolded using linked and long reads (A2) and then merged with the previously published assembly (A3). The resulting assemblies were compared using parameters describing the size, fragmentation, correctness, and completeness of the assemblies. Results showed that the A2 assembly was less fragmented, more complete and more correct than A1. A3 showed improvement in fragmentation and correctness compared with A2 and the published assembly but was slightly less complete than the published assembly. Thus, we here confirmed the previously published herring assembly, and made improvements by further scaffolding the assembly and removing low-quality sequences using linked and long reads and merging of assemblies.

Download Full-text

Whole-Genome Sequencing and De Novo Assembly of Malassezia pachydermatis Isolated from the Ear Canal of a Dog with Otitis

Microbiology Resource Announcements ◽

10.1128/mra.00205-21 ◽

2021 ◽

Vol 10 (21) ◽

Author(s):

S. D’Andreano ◽

J. Viñes ◽

O. Francino

Keyword(s):

Genome Sequencing ◽

Genome Sequence ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome ◽

Ear Canal ◽

Malassezia Pachydermatis ◽

Content Type ◽

Long Reads ◽

Genome Assemblies

We have de novo assembled the genome sequence of Malassezia pachydermatis isolated from a canine otitis sample with Nanopore-only long reads. With 99× coverage and 8.23 Mbp, the genome sequence was assembled in 10 contigs, with 6 of them corresponding to chromosomes, improving the scaffolding of previous genome assemblies for the species.

Download Full-text

Whole genome sequencing and assembly of a Caenorhabditis elegans genome with complex genomic rearrangements using the MinION sequencing device

10.1101/099143 ◽

2017 ◽

Cited By ~ 12

Author(s):

JR Tyson ◽

NJ O’Neil ◽

M Jain ◽

HE Olsen ◽

P Hieter ◽

...

Keyword(s):

Caenorhabditis Elegans ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Sequence Data ◽

Genomic Rearrangements ◽

Whole Genome ◽

C Elegans ◽

Long Reads

ABSTRACTAdvances in 3rd generation sequencing have opened new possibilities for ‘benchtop’ whole genome sequencing. The MinION is a portable device that uses nanopore technology and can sequence long DNA molecules. MinION long reads are well suited for sequencing and de novo assembly of complex genomes with large repetitive elements. Long reads also facilitate the identification of complex genomic rearrangements such as those observed in tumor genomes. To assess the feasibility of the de novo assembly of large complex genomes using both MinION and Illumina platforms, we sequenced the genome of a Caenorhabditis elegans strain that contains a complex acetaldehyde-induced rearrangement and a biolistic bombardment-mediated insertion of a GFP containing plasmid. Using ∼5.8 gigabases of MinION sequence data, we were able to assemble a C. elegans genome containing 145 contigs (N50 contig length = 1.22 Mb) that covered >99% of the 100,286,401 bp reference genome. In contrast, using ∼8.04 gigabases of Illumina sequence data, we were able to assemble a C. elegans genome in 38,645 contigs (N50 contig length = ∼26 kb) containing 117 Mb. From the MinION genome assembly we identified the complex structures of both the acetaldehyde-induced mutation and the biolistic-mediated insertion. To date, this is the largest genome to be assembled exclusively from MinION data and is the first demonstration that the long reads of MinION sequencing can be used for whole genome assembly of large (100 Mb) genomes and the elucidation of complex genomic rearrangements.

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals structural genome variation in rainbow trout

10.1101/2020.12.28.424581 ◽

2020 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C. Waldbieser ◽

Ramey C. Youngblood ◽

Paul A. Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Haploid Chromosome Number ◽

Long Reads ◽

Homozygous Line ◽

Igh Genes

AbstractCurrently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2N=64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.Article SummaryA de-novo genome assembly was generated for the Arlee homozygous line of rainbow trout to enable identification and characterization of genome variants towards developing a rainbow trout pan-genome reference. The new assembly was generated using the PacBio sequencing technology and scaffolding with Hi-C contact maps and Bionano optical mapping. A contiguous genome assembly was obtained, with the contig and scaffold N50 over 15.6 Mb and 39 Mb, respectively, and 95% of the assembly in chromosome sequences. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes.

Download Full-text

phasebook: haplotype-aware de novo assembly of diploid genomes from long reads

10.1101/2021.07.02.450883 ◽

2021 ◽

Author(s):

Xiao Luo ◽

Xiongbin Kang ◽

Alexander Schoenhuth

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Haplotype Diversity ◽

Read Length ◽

Diploid Genome ◽

Sequencing Technologies ◽

Novel Approach ◽

Long Reads ◽

Long Read

Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly thanks to advantages of read length. However, current long-read assemblers usually introduce disturbing biases or fail to capture the haplotype diversity of the diploid genome. Here, we present phasebook, a novel approach for reconstructing the haplotypes of diploid genomes from long reads de novo. Benchmarking experiments demonstrate that our method outperforms other approaches in terms of haplotype coverage by large margins, while preserving competitive performance or even achieving advantages in terms of all other aspects relevant for genome assembly.

Download Full-text

Correcting palindromes in long reads after whole-genome amplification

10.1101/173872 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sven Warris ◽

Elio Schijlen ◽

Henri van de Geest ◽

Rahulsimham Vegesna ◽

Thamara Hesselink ◽

...

Keyword(s):

Whole Genome Amplification ◽

De Novo ◽

Single Cells ◽

Whole Genome ◽

Read Mapping ◽

Genome Amplification ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Real World Datasets

AbstractNext-generation sequencing requires sufficient DNA to be available. If limited, whole-genome amplification is applied to generate additional amounts of DNA. Such amplification often results in many chimeric DNA fragments, in particular artificial palindromic sequences, which limit the usefulness of long reads from technologies such as PacBio and Oxford Nanopore. Here, we present Pacasus, a tool for correcting such errors in long reads. We demonstrate on two real-world datasets that it markedly improves subsequent read mapping and de novo assembly, yielding results similar to these that would be obtained with non-amplified DNA. With Pacasus long-read technologies become readily available for sequencing targets with very small amounts of DNA, such as single cells or even single chromosomes.

Download Full-text

A study of transposable element-associated structural variations (TASVs) using a de novo-assembled Korean genome

Experimental & Molecular Medicine ◽

10.1038/s12276-021-00586-y ◽

2021 ◽

Author(s):

Seyoung Mun ◽

Songmi Kim ◽

Wooseok Lee ◽

Keunsoo Kang ◽

Thomas J. Meyer ◽

...

Keyword(s):

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Personal Genome ◽

Human Populations ◽

Whole Genome ◽

Structural Variations ◽

Insert Size ◽

Human Genomes ◽

Next Generation Sequencing Ngs

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.

Download Full-text