Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C

Matt A Field; Benjamin D Rosen; Olga Dudchenko; Eva K F Chan; Andre E Minoche; Richard J Edwards; Kirston Barton; Ruth J Lyons; Daniel Enosi Tuipulotu; Vanessa M Hayes; Arina D. Omer; Zane Colaric; Jens Keilwagen; Ksenia Skvortsova; Ozren Bogdanovic; Martin A Smith; Erez Lieberman Aiden; Timothy P L Smith; Robert A Zammit; J William O Ballard

doi:10.1093/gigascience/giaa027

A de novo assembly of the sweet cherry (Prunus avium cv. Tieton) genome using linked-read sequencing technology

PeerJ ◽

10.7717/peerj.9114 ◽

2020 ◽

Vol 8 ◽

pp. e9114 ◽

Cited By ~ 1

Author(s):

Jiawei Wang ◽

Weizhen Liu ◽

Dongzi Zhu ◽

Xiang Zhou ◽

Po Hong ◽

...

Keyword(s):

Sweet Cherry ◽

Prunus Avium ◽

Reference Genome ◽

De Novo ◽

Draft Genome ◽

Single Copy ◽

Sequencing Data ◽

Sequencing Technology ◽

High Quality ◽

Eukaryotic Genes

The sweet cherry (Prunus avium) is one of the most economically important fruit species in the world. However, there is a limited amount of genetic information available for this species, which hinders breeding efforts at a molecular level. We were able to describe a high-quality reference genome assembly and annotation of the diploid sweet cherry (2n = 2x = 16) cv. Tieton using linked-read sequencing technology. We generated over 750 million clean reads, representing 112.63 GB of raw sequencing data. The Supernova assembler produced a more highly-ordered and continuous genome sequence than the current P. avium draft genome, with a contig N50 of 63.65 KB and a scaffold N50 of 2.48 MB. The final scaffold assembly was 280.33 MB in length, representing 82.12% of the estimated Tieton genome. Eight chromosome-scale pseudomolecules were constructed, completing a 214 MB sequence of the final scaffold assembly. De novo, homology-based, and RNA-seq methods were used together to predict 30,975 protein-coding loci. 98.39% of core eukaryotic genes and 97.43% of single copy orthologues were identified in the embryo plant, indicating the completeness of the assembly. Linked-read sequencing technology was effective in constructing a high-quality reference genome of the sweet cherry, which will benefit the molecular breeding and cultivar identification in this species.

Download Full-text

Solyntus, the New Highly Contiguous Reference Genome for Potato (Solanum tuberosum)

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401550 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3489-3495

Author(s):

Natascha van Lieshout ◽

Ate van der Burgt ◽

Michiel E. de Vries ◽

Menno ter Maat ◽

David Eickholt ◽

...

Keyword(s):

Solanum Tuberosum ◽

Reference Genome ◽

De Novo ◽

Draft Genome ◽

Single Copy ◽

Rapid Expansion ◽

Potato Genome ◽

Homozygous Diploid ◽

Gene Orthologs ◽

Reference Genomes

With the rapid expansion of the application of genomics and sequencing in plant breeding, there is a constant drive for better reference genomes. In potato (Solanum tuberosum), the third largest food crop in the world, the related species S. phureja, designated “DM”, has been used as the most popular reference genome for the last 10 years. Here, we introduce the de novo sequenced genome of Solyntus as the next standard reference in potato genome studies. A true Solanum tuberosum made up of 116 contigs that is also highly homozygous, diploid, vigorous and self-compatible, Solyntus provides a more direct and contiguous reference then ever before available. It was constructed by sequencing with state-of-the-art long and short read technology and assembled with Canu. The 116 contigs were assembled into scaffolds to form each pseudochromosome, with three contigs to 17 contigs per chromosome. This assembly contains 93.7% of the single-copy gene orthologs from the Solanaceae set and has an N50 of 63.7 Mbp. The genome and related files can be found at https://www.plantbreeding.wur.nl/Solyntus/. With the release of this research line and its draft genome we anticipate many exciting developments in (diploid) potato research.

Download Full-text

Genome sequence of the agarwood tree Aquilaria sinensis (Lour.) Spreng: the first chromosome-level draft genome in the Thymelaeceae family

GigaScience ◽

10.1093/gigascience/giaa013 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 1

Author(s):

Xupo Ding ◽

Wenli Mei ◽

Qiang Lin ◽

Hao Wang ◽

Jun Wang ◽

...

Keyword(s):

Genome Assembly ◽

Gene Annotation ◽

Draft Genome ◽

Single Copy ◽

Aquilaria Sinensis ◽

Final Size ◽

Plant Resources ◽

Protein Coding ◽

High Level ◽

Chromosome Level

Abstract Backgroud Aquilaria sinensis (Lour.) Spreng is one of the important plant resources involved in the production of agarwood in China. The agarwood resin collected from wounded Aquilaria trees has been used in Asia for aromatic or medicinal purposes from ancient times, although the mechanism underlying the formation of agarwood still remains poorly understood owing to a lack of accurate and high-quality genetic information. Findings We report the genomic architecture of A. sinensis by using an integrated strategy combining Nanopore, Illumina, and Hi-C sequencing. The final genome was ∼726.5 Mb in size, which reached a high level of continuity and a contig N50 of 1.1 Mb. We combined Hi-C data with the genome assembly to generate chromosome-level scaffolds. Eight super-scaffolds corresponding to the 8 chromosomes were assembled to a final size of 716.6 Mb, with a scaffold N50 of 88.78 Mb using 1,862 contigs. BUSCO evaluation reveals that the genome completeness reached 95.27%. The repeat sequences accounted for 59.13%, and 29,203 protein-coding genes were annotated in the genome. According to phylogenetic analysis using single-copy orthologous genes, we found that A. sinensis is closely related to Gossypium hirsutum and Theobroma cacao from the Malvales order, and A. sinensis diverged from their common ancestor ∼53.18–84.37 million years ago. Conclusions Here, we present the first chromosome-level genome assembly and gene annotation of A. sinensis. This study should contribute to valuable genetic resources for further research on the agarwood formation mechanism, genome-assisted improvement, and conservation biology of Aquilaria species.

Download Full-text

Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome

BMC Genomics ◽

10.1186/s12864-021-07493-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Richard J. Edwards ◽

Matt A. Field ◽

James M. Ferguson ◽

Olga Dudchenko ◽

Jens Keilwagen ◽

...

Keyword(s):

Reference Genome ◽

De Novo ◽

Genome Structure ◽

Canis Lupus Familiaris ◽

Structural Variations ◽

German Shepherd ◽

High Quality ◽

Entire Family ◽

The Impact ◽

Reference Genomes

Abstract Background Basenjis are considered an ancient dog breed of central African origins that still live and hunt with tribesmen in the African Congo. Nicknamed the barkless dog, Basenjis possess unique phylogeny, geographical origins and traits, making their genome structure of great interest. The increasing number of available canid reference genomes allows us to examine the impact the choice of reference genome makes with regard to reference genome quality and breed relatedness. Results Here, we report two high quality de novo Basenji genome assemblies: a female, China (CanFam_Bas), and a male, Wags. We conduct pairwise comparisons and report structural variations between assembled genomes of three dog breeds: Basenji (CanFam_Bas), Boxer (CanFam3.1) and German Shepherd Dog (GSD) (CanFam_GSD). CanFam_Bas is superior to CanFam3.1 in terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly. By aligning short read data from 58 representative dog breeds to three reference genomes, we demonstrate how the choice of reference genome significantly impacts both read mapping and variant detection. Conclusions The growing number of high-quality canid reference genomes means the choice of reference genome is an increasingly critical decision in subsequent canid variant analyses. The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds. However, we believe more comprehensive analyses across the entire family of canids is more suited to a pangenome approach. Collectively this work highlights the importance the choice of reference genome makes in all variation studies.

Download Full-text

The First Draft Genome Assembly of Snow Sheep (Ovis nivicola)

Genome Biology and Evolution ◽

10.1093/gbe/evaa124 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1330-1336 ◽

Cited By ~ 2

Author(s):

Maulik Upadhyay ◽

Andreas Hauser ◽

Elisabeth Kunz ◽

Stefan Krebs ◽

Helmut Blum ◽

...

Keyword(s):

De Novo ◽

Gene Annotation ◽

Gene Prediction ◽

Repetitive Sequences ◽

Draft Genome ◽

Single Copy ◽

Climatic Conditions ◽

Draft Genome Assembly ◽

Sheep Genome ◽

Long Reads

Abstract The snow sheep, Ovis nivicola, which is endemic to the mountain ranges of northeastern Siberia, are well adapted to the harsh cold climatic conditions of their habitat. In this study, using long reads of Nanopore sequencing technology, whole-genome sequencing, assembly, and gene annotation of a snow sheep were carried out. Additionally, RNA-seq reads from several tissues were also generated to supplement the gene prediction in snow sheep genome. The assembled genome was ∼2.62 Gb in length and was represented by 7,157 scaffolds with N50 of about 2 Mb. The repetitive sequences comprised of 41% of the total genome. BUSCO analysis revealed that the snow sheep assembly contained full-length or partial fragments of 97% of mammalian universal single-copy orthologs (n = 4,104), illustrating the completeness of the assembly. In addition, a total of 20,045 protein-coding sequences were identified using comprehensive gene prediction pipeline. Of which 19,240 (∼96%) sequences were annotated using protein databases. Moreover, homology-based searches and de novo identification detected 1,484 tRNAs; 243 rRNAs; 1,931 snRNAs; and 782 miRNAs in the snow sheep genome. To conclude, we generated the first de novo genome of the snow sheep using long reads; these data are expected to contribute significantly to our understanding related to evolution and adaptation within the Ovis genus.

Download Full-text

Draft genome of a porcupinefish, Diodon Holocanthus

10.1101/775387 ◽

2019 ◽

Author(s):

Mengyang Xu ◽

Xiaoshan Su ◽

Mengqi Zhang ◽

Ming Li ◽

Xiaoyun Huang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Draft Genome ◽

Single Copy ◽

Single Individual ◽

Protein Coding ◽

Long Read ◽

Phylogeny And Evolution ◽

Downstream Analysis

AbstractThe long-spine porcupinefish, Diodon holocanthus (Diodontidae, Tetraodontiformes, Actinopterygii), also known as the freckled porcupinefish, attracts great interest of ecology and economy. Its distinct characteristics including inflation reaction, spiny skin and tetradotoxin, however, have not been fully studied without a complete genome assembly.In this study, the whole genome of a single individual was sequenced using single tube-Long Fragment Read co-barcode reads, generating 154.3 Gb of paired-end data (219.8× depth). The gap was further filled using small amount of Oxford Nanopore MinION long read dataset (11.4Gb, 15.9× depth). Taking full use of long, medium, short-range of genome assembly information, the final assembled sequences with a total length of 650.02 Mb obtained contig and scaffold N50 sizes of 2.15 Mb and 8.13 Mb, respectively, despite of high repetitive content. Benchmarking Universal Single-Copy Orthologs captured 95.7% (2,474) of core genes to assess the completeness. In addition, 206.5 Mb (32.10%) of repetitive sequences were identified, and 20,840 protein-coding genes were annotated, among which 18,281 (87.72%) proteins were assigned with possible functions.This is the first demonstration of de novo genome of the porcupinefish, which will benefit downstream analysis of ontogeny, phylogeny, and evolution, and improve the exploration of its unique defensive mechanism.

Download Full-text

Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome

10.21203/rs.3.rs-135125/v1 ◽

2020 ◽

Author(s):

Richard J Edwards ◽

Matt A. Field ◽

James M. Ferguson ◽

Olga Dudchenko ◽

Jens Keilwagen ◽

...

Keyword(s):

Reference Genome ◽

De Novo ◽

Genome Structure ◽

Canis Lupus Familiaris ◽

Structural Variations ◽

German Shepherd ◽

High Quality ◽

Entire Family ◽

The Impact ◽

Reference Genomes

Abstract Background Basenjis are considered an ancient dog breed of central African origins that still live and hunt with tribesmen in the African Congo. Nicknamed the barkless dog, Basenjis possess unique phylogeny, geographical origins and traits, making their genome structure of great interest. The increasing number of available canid reference genomes allows us to examine the impact the choice of reference genome makes with regard to reference genome quality and breed relatedness. Results Here, we report two high quality de novo Basenji genome assemblies: a female, China (CanFam_Bas), and a male, Wags. We conduct pairwise comparisons and report structural variations between assembled genomes of three dog breeds: Basenji (CanFam_Bas), Boxer (CanFam3.1) and German Shepherd Dog (GSD) (CanFam_GSD). CanFam_Bas is superior to CanFam3.1 in terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly. By aligning short read data from 58 representative dog breeds to three reference genomes, we demonstrate how the choice of reference genome significantly impacts both read mapping and variant detection. Conclusions The growing number of high-quality canid reference genomes means the choice of reference genome is an increasingly critical decision in subsequent canid variant analyses. The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds. However, we believe more comprehensive analyses across the entire family of canids is more suited to a pangenome approach. Collectively this work highlights the importance the choice of reference genome makes in all variation studies.

Download Full-text

Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome

10.1101/2020.11.11.379073 ◽

2020 ◽

Cited By ~ 1

Author(s):

Richard J. Edwards ◽

Matt A. Field ◽

James M. Ferguson ◽

Olga Dudchenko ◽

Jens Keilwagen ◽

...

Keyword(s):

Reference Genome ◽

De Novo ◽

Genome Structure ◽

Canis Lupus Familiaris ◽

Structural Variations ◽

German Shepherd ◽

High Quality ◽

Entire Family ◽

The Impact ◽

Reference Genomes

AbstractBackgroundBasenjis are considered an ancient dog breed of central African origins that still live and hunt with tribesmen in the African Congo. Nicknamed the barkless dog, Basenjis possess unique phylogeny, geographical origins and traits, making their genome structure of great interest. The increasing number of available canid reference genomes allows us to examine the impact the choice of reference genome makes with regard to reference genome quality and breed relatedness.ResultsHere, we report two high quality de novo Basenji genome assemblies: a female, China (CanFam_Bas), and a male, Wags. We conduct pairwise comparisons and report structural variations between assembled genomes of three dog breeds: Basenji (CanFam_Bas), Boxer (CanFam3.1) and German Shepherd Dog (GSD) (CanFam_GSD). CanFam_Bas is superior to CanFam3.1 in terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly. By aligning short read data from 58 representative dog breeds to three reference genomes, we demonstrate how the choice of reference genome significantly impacts both read mapping and variant detection.ConclusionsThe growing number of high-quality canid reference genomes means the choice of reference genome is an increasingly critical decision in subsequent canid variant analyses. The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds. However, we believe more comprehensive analyses across the entire family of canids is more suited to a pangenome approach. Collectively this work highlights the importance the choice of reference genome makes in all variation studies.

Download Full-text

De novo assembly and annotation of a highly contiguous reference genome of the fathead minnow (Pimephales promelas) reveals an AT-rich repetitive genome with compact gene structure

10.1101/2021.02.24.432777 ◽

2021 ◽

Cited By ~ 1

Author(s):

John Martinson ◽

David C. Bencic ◽

Gregory P. Toth ◽

Mitchell S. Kostich ◽

Robert W. Flick ◽

...

Keyword(s):

Gene Structure ◽

Fathead Minnow ◽

Reference Genome ◽

De Novo ◽

Gene Annotation ◽

Pimephales Promelas ◽

Single Copy ◽

Model Organisms ◽

Coding Regions ◽

Genomic Resource

ABSTRACTThe Fathead Minnow (FHM) is one of the most important and widely used model organisms in aquatic toxicology. The lack of a high-quality and well-annotated FHM reference genome, however, has severely hampered the efforts using modem ‘omics approaches with FHM for environmental toxicogenomics studies. We present here a de novo assembled and nearly complete reference of the fathead minnow genome. Compared to the current fragmented and sparsely annotated FHM genome assembly (FHM1), the new highly contiguous and well-annotated FHM reference genome (FHM2) represents a major improvement, having 95.1% of the complete BUSCOs (Benchmarking Universal Single-Copy Orthologs) and a scaffold N50 of 12.0 Mbps. The completeness of gene annotation for the FHM2 reference genome was demonstrated to be comparable to that of the zebrafish (ZF) GRCz11 reference genome. In addition, our comparative genomics analyses between FHM and ZF revealed highly conserved coding regions between two species while discovering much more compact gene structure in FHM than ZF. This study not only provides insights for assembling a highly repetitive AT-rich genome, but also delivers a critical genomic resource essential for toxicogenomics studies in environmental toxicology.

Download Full-text

De novo genome assembly of Solanum sitiens reveals structural variation associated with drought and salinity tolerance

Bioinformatics ◽

10.1093/bioinformatics/btab048 ◽

2021 ◽

Author(s):

Corentin Molitor ◽

Tomasz J Kurowski ◽

Pedro M Fidalgo de Almeida ◽

Pramod Eerolla ◽

Daniel J Spindlow ◽

...

Keyword(s):

Drought Resistance ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Transcriptome Assembly ◽

Crop Improvement ◽

Single Copy ◽

Supplementary Information ◽

Analysis Tool ◽

De Novo Genome Assembly

Abstract Motivation Solanum sitiens is a self-incompatible wild relative of tomato, characterized by salt and drought-resistance traits, with the potential to contribute through breeding programmes to crop improvement in cultivated tomato. This species has a distinct morphology, classification and ecotype compared to other stress resistant wild tomato relatives such as S.pennellii and S.chilense. Therefore, the availability of a reference genome for S.sitiens will facilitate the genetic and molecular understanding of salt and drought resistance. Results A high-quality de novo genome and transcriptome assembly for S.sitiens (Accession LA1974) has been developed. A hybrid assembly strategy was followed using Illumina short reads (∼159× coverage) and PacBio long reads (∼44× coverage), generating a total of ∼262 Gbp of DNA sequence. A reference genome of 1245 Mbp, arranged in 1483 scaffolds with an N50 of 1.826 Mbp was generated. Genome completeness was estimated at 95% using the Benchmarking Universal Single-Copy Orthologs (BUSCO) and the K-mer Analysis Tool (KAT). In addition, ∼63 Gbp of RNA-Seq were generated to support the prediction of 31 164 genes from the assembly, and to perform a de novo transcriptome. Lastly, we identified three large inversions compared to S.lycopersicum, containing several drought-resistance-related genes, such as beta-amylase 1 and YUCCA7. Availability and implementation S.sitiens (LA1974) raw sequencing, transcriptome and genome assembly have been deposited at the NCBI’s Sequence Read Archive, under the BioProject number ‘PRJNA633104’. All the commands and scripts necessary to generate the assembly are available at the following github repository: https://github.com/MCorentin/Solanum_sitiens_assembly. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text