scholarly journals The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies

GigaScience ◽  
2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Weiwen Wang ◽  
Ashutosh Das ◽  
David Kainer ◽  
Miriam Schalamun ◽  
Alejandro Morales-Suarez ◽  
...  

Abstract Background Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly. Findings We generated high coverage of long- (Nanopore, 174×) and short- (Illumina, 228×) read data from a single E. pauciflora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼0.006 errors per base. Conclusions We report a draft genome of E. pauciflora, which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies from a single dataset.

2019 ◽  
Author(s):  
Weiwen Wang ◽  
Ashutosh Das ◽  
David Kainer ◽  
Miriam Schalamun ◽  
Alejandro Morales-Suarez ◽  
...  

AbstractBackgroundSelecting the best genome assembly from a collection of draft assemblies for the same species remains a difficult task. Here, we combine new and existing approaches to help to address this, using the non-model plant Eucalyptus pauciflora (snow gum) as a test case. Eucalyptus pauciflora is a long-lived tree with high economic and ecological importance. Currently, little genomic information for Eucalyptus pauciflora is available.FindingsWe generated high coverage of long-(Nanopore, 174x) and short-(Illumina, 228x) read data from a single Eucalyptus pauciflora individual and compared assemblies from four assemblers with a variety of settings: Canu, Flye, Marvel, and MaSuRCA. A key component of our approach is to keep a randomly selected collection of ~10% of both long- and short-reads separate from the assemblies to use as a validation set with which to assess the assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in eight ways: contig N50, BUSCO scores, LAI scores, assembly ploidy, base-level error rate, computing genome assembly likelihoods, structural variation and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ~0.006 errors per base.ConclusionsWe report a draft genome of Eucalyptus pauciflora, which will be a valuable resource for further genomic studies of eucalypts. These approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies for a single species.


2018 ◽  
Vol 6 (20) ◽  
Author(s):  
Narendra Meena ◽  
M. Vasundhara ◽  
M. Sudhakara Reddy ◽  
Prashanth Suravajhala ◽  
U. S. Raghavender ◽  
...  

ABSTRACT Here, we report the draft de novo genome sequence assembly of Fusarium tricinctum (strain T6), using IonTorrent sequencing chemistry and an Ion 530 chip ExT kit for sequencing. The genome assembly resulted in 42,732,204 bp from a total 6.62 Gb, with a median read length of 386 bp.


Author(s):  
Guangtu Gao ◽  
Susana Magadan ◽  
Geoffrey C Waldbieser ◽  
Ramey C Youngblood ◽  
Paul A Wheeler ◽  
...  

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.


2020 ◽  
Vol 12 (2) ◽  
pp. 3917-3925
Author(s):  
Greer A Dolby ◽  
Matheo Morales ◽  
Timothy H Webster ◽  
Dale F DeNardo ◽  
Melissa A Wilson ◽  
...  

Abstract Toll-like receptors (TLRs) are a complex family of innate immune genes that are well characterized in mammals and birds but less well understood in nonavian sauropsids (reptiles). The advent of highly contiguous draft genomes of nonmodel organisms enables study of such gene families through analysis of synteny and sequence identity. Here, we analyze TLR genes from the genomes of 22 tetrapod species. Findings reveal a TLR8 gene expansion in crocodilians and turtles (TLR8B), and a second duplication (TLR8C) specifically within turtles, followed by pseudogenization of that gene in the nonfreshwater species (desert tortoise and green sea turtle). Additionally, the Mojave desert tortoise (Gopherus agassizii) has a stop codon in TLR8B (TLR8-1) that is polymorphic among conspecifics. Revised orthology further reveals a new TLR homolog, TLR21-like, which is exclusive to lizards, snakes, turtles, and crocodilians. These analyses were made possible by a new draft genome assembly of the desert tortoise (gopAga2.0), which used chromatin-based assembly to yield draft chromosomal scaffolds (L50 = 26 scaffolds, N50 = 28.36 Mb, longest scaffold = 107 Mb) and an enhanced de novo genome annotation with 25,469 genes. Our three-step approach to orthology curation and comparative analysis of TLR genes shows what new insights are possible using genome assemblies with chromosome-scale scaffolds that permit integration of synteny conservation data.


2016 ◽  
Author(s):  
Taruna Aggarwal ◽  
Anthony Westbrook ◽  
Kirk Broders ◽  
Keith Woeste ◽  
Matthew D MacManes

Geosmithia morbida is a filamentous ascomycete that causes Thousand Cankers Disease in the eastern black walnut tree. This pathogen is commonly found in the western U.S.; however, recently the disease was also detected in several eastern states where the black walnut lumber industry is concentrated. G. morbida is one of two known phytopathogens within the genus Geosmithia, and it is vectored into the host tree via the walnut twig beetle. We present the first de novo draft genome of G. morbida. It is 26.5 Mbp in length and contains less than 1% repetitive elements. The genome possesses an estimated 6,273 genes, 277 of which are predicted to encode proteins with unknown functions. Approximately 31.5% of the proteins in G. morbida are homologous to proteins involved in pathogenicity, and 5.6% of the proteins contain signal peptides that indicate these proteins are secreted. Several studies have investigated the evolution of pathogenicity in pathogens of agricultural crops; forest fungal pathogens are often neglected because research efforts are focused on food crops. G. morbida is one of the few tree phytopathogens to be sequenced, assembled and annotated. The first draft genome of G. morbida serves as a valuable tool for comprehending the underlying molecular and evolutionary mechanisms behind pathogenesis within the Geosmithia genus. Keywords: de novo genome assembly, pathogenesis, forest pathogen, black walnut, walnut twig beetle.


2018 ◽  
Author(s):  
Sébastien Renaut ◽  
Davide Guerra ◽  
Walter R. Hoeh ◽  
Donald T. Stewart ◽  
Arthur E. Bogan ◽  
...  

AbstractFreshwater mussels (Bivalvia: Unionida) serve an important role as aquatic ecosystem engineers but are one of the most critically imperilled groups of animals. Here, we used a combination of sequencing strategies to assemble and annotate a draft genome of Venustaconcha ellipsiformis, which will serve as a valuable genomic resource given the ecological value and unique “doubly uniparental inheritance” mode of mitochondrial DNA transmission of freshwater mussels. The genome described here was obtained by combining high coverage short reads (65X genome coverage of Illumina paired-end and 11X genome coverage of mate-pairs sequences) with low coverage Pacific Biosciences long reads (0.3X genome coverage). Briefly, the final scaffold assembly accounted for a total size of 1.54Gb (366,926 scaffolds, N50 = 6.5Kb, with 2.3% of “N” nucleotides), representing 86% of the predicted genome size of 1.80Gb, while over one third of the genome (37.5%) consisted of repeated elements and more than 85% of the core eukaryotic genes were recovered. Given the repeated genetic bottlenecks of V. ellipsiformis populations as a result of glaciations events, heterozygosity was also found to be remarkably low (0.6%), in contrast to most other sequenced bivalve species. Finally, we reassembled the full mitochondrial genome and found six polymorphic sites with respect to the previously published reference. This resource opens the way to comparative genomics studies to identify genes related to the unique adaptations of freshwater mussels and their distinctive mitochondrial inheritance mechanism.


Gigabyte ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Jin Yu ◽  
Linzhou Li ◽  
Sibo Wang ◽  
Shanshan Dong ◽  
Ziqiang Chen ◽  
...  

Mosses comprise one of three lineages forming a sister group to extant vascular plants. Having emerged from an early split in the diversification of embryophytes, mosses may offer complementary insights into the evolution of traits following the transition to, and colonization of, land. Here, we report the draft nuclear genome of Fontinalis antipyretica (Fontinalaceae, Hypnales), a charismatic aquatic moss that is widespread in temperate regions of the Northern Hemisphere. We sequenced and de novo-assembled its genome using the 10X Genomics method. The genome comprises 385.2 Mbp, with a scaffold N50 of 45.8 Kbp. The assembly captured 87.2% of the 430 genes in the BUSCO Viridiplantae odb10 dataset. The newly generated F. antipyretica genome is the third moss genome, and the second seedless aquatic plant genome, to be sequenced and assembled to date.


Gigabyte ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Julia Voelker ◽  
Mervyn Shepherd ◽  
Ramil Mauleon

The economically important Melaleuca alternifolia (tea tree) is the source of a terpene-rich essential oil with therapeutic and cosmetic uses around the world. Tea tree has been cultivated and bred in Australia since the 1990s. It has been extensively studied for the genetics and biochemistry of terpene biosynthesis. Here, we report a high quality de novo genome assembly using Pacific Biosciences and Illumina sequencing. The genome was assembled into 3128 scaffolds with a total length of 362 Mb (N50  = 1.9 Mb), with significantly higher contiguity than a previous assembly (N50  = 8.7 Kb). Using a homology-based, RNA-seq evidence-based and ab initio prediction approach, 37,226 protein-coding genes were predicted. Genome assembly and annotation exhibited high completeness scores of 98.1% and 89.4%, respectively. Sequence contiguity was sufficient to reveal extensive gene order conservation and chromosomal rearrangements in alignments with Eucalyptus grandis and Corymbia citriodora genomes. This new genome advances currently available resources to investigate the genome structure and gene family evolution of M. alternifolia. It will enable further comparative genomic studies in Myrtaceae to elucidate the genetic foundations of economically valuable traits in this crop.


DNA Research ◽  
2019 ◽  
Vol 26 (5) ◽  
pp. 423-431 ◽  
Author(s):  
Deyou Qiu ◽  
Shenglong Bai ◽  
Jianchao Ma ◽  
Lisha Zhang ◽  
Fenjuan Shao ◽  
...  

AbstractPoplar 84K (Populus alba x P. tremula var. glandulosa) is a fast-growing poplar hybrid. Originated in South Korea, this hybrid has been extensively cultivated in northern China. Due to the economic and ecological importance of this hybrid and high transformability, we now report the de novo sequencing and assembly of a male individual of poplar 84K using PacBio and Hi-C technologies. The final reference nuclear genome (747.5 Mb) has a contig N50 size of 1.99 Mb and a scaffold N50 size of 19.6 Mb. Complete chloroplast and mitochondrial genomes were also assembled from the sequencing data. Based on similarities to the genomes of P. alba var. pyramidalis and P. tremula, we were able to identify two subgenomes, representing 356 Mb from P. alba (subgenome A) and 354 Mb from P. tremula var. glandulosa (subgenome G). The phased assembly allowed us to detect the transcriptional bias between the two subgenomes, and we found that the subgenome from P. tremula displayed dominant expression in both 84K and another widely used hybrid, P. tremula x P. alba. This high-quality poplar 84K genome will be a valuable resource for poplar breeding and for molecular biology studies.


GigaScience ◽  
2019 ◽  
Vol 8 (9) ◽  
Author(s):  
Yu Xing ◽  
Yang Liu ◽  
Qing Zhang ◽  
Xinghua Nie ◽  
Yamin Sun ◽  
...  

AbstractBackgroundThe Chinese chestnut (Castanea mollissima) is widely cultivated in China for nut production. This plant also plays an important ecological role in afforestation and ecosystem services. To facilitate and expand the use of C. mollissima for breeding and its genetic improvement, we report here the whole-genome sequence of C. mollissima.FindingsWe produced a high-quality assembly of the C. mollissima genome using Pacific Biosciences single-molecule sequencing. The final draft genome is ∼785.53 Mb long, with a contig N50 size of 944 kb, and we further annotated 36,479 protein-coding genes in the genome. Phylogenetic analysis showed that C. mollissima diverged from Quercus robur, a member of the Fagaceae family, ∼13.62 million years ago.ConclusionsThe high-quality whole-genome assembly of C. mollissima will be a valuable resource for further genetic improvement and breeding for disease resistance and nut quality.


Sign in / Sign up

Export Citation Format

Share Document