De Novo Genome Assembly of the Meadow Brown Butterfly, Maniola jurtina

Meadow brown butterflies (Maniola jurtina) on the Isles of Scilly represent an ideal model in which to dissect the links between genotype, phenotype and long-term patterns of selection in the wild - a largely unfulfilled but fundamental aim of modern biology. To meet this aim, a clear description of genotype is required. Here we present the draft genome sequence of M. jurtina to serve as a founding genetic resource for this species. Seven libraries were constructed using pooled DNA from five wild caught spotted females and sequenced using Illumina, PacBio RSII and MinION technology. A novel hybrid assembly approach was employed to generate a final assembly with an N50 of 214 kb (longest scaffold 2.9 Mb). The sequence assembly described here predicts a gene count of 36,294 and includes variants and gene duplicates from five genotypes. Core BUSCO (Benchmarking Universal Single-Copy Orthologs) gene sets of Arthropoda and Insecta recovered 90.5% and 88.7% complete and single-copy genes respectively. Comparisons with 17 other Lepidopteran species placed 86.5% of the assembled genes in orthogroups. Our results provide the first high-quality draft genome and annotation of the butterfly M. jurtina.

Download Full-text

De novo genome assembly of the meadow brown butterfly, Maniola jurtina

10.1101/715243 ◽

2019 ◽

Author(s):

Kumar Saurabh Singh ◽

David J. Hosken ◽

Nina Wedell ◽

Richard ffrench-Constant ◽

Chris Bass ◽

...

Keyword(s):

De Novo ◽

Draft Genome ◽

Single Copy ◽

De Novo Genome Assembly ◽

Modern Biology ◽

Final Assembly ◽

Gene Sets ◽

In The Wild ◽

Maniola Jurtina

AbstractBackgroundMeadow brown butterflies (Maniola jurtina) on the Isles of Scilly represent an ideal model in which to dissect the links between genotype, phenotype and long-term patterns of selection in the wild - a largely unfulfilled but fundamental aim of modern biology. To meet this aim, a clear description of genotype is required.FindingsHere we present the draft genome sequence of M. jurtina to serve as an initial genetic resource for this species. Seven libraries were constructed using DNA from multiple wild caught females and sequenced using Illumina, PacBio RSII and MinION technology. A novel hybrid assembly approach was employed to generate a final assembly with an N50 of 214 kb (longest scaffold 2.9 Mb). The genome encodes a total of 36,294 genes. 90.3% and 88.7% of core BUSCO (Benchmarking Universal Single-Copy Orthologs) Arthropoda and Insecta gene sets were recovered as complete single-copies from this assembly. Comparisons with 17 other Lepidopteran species placed 86.5% of the assembled genes in orthogroups.ConclusionsOur results provide the first high-quality draft genome and annotation of the butterfly M. jurtina.

Download Full-text

De Novo Sequencing and Hybrid Assembly of the Biofuel Crop Jatropha curcas L.: Identification of Quantitative Trait Loci for Geminivirus Resistance

Genes ◽

10.3390/genes10010069 ◽

2019 ◽

Vol 10 (1) ◽

pp. 69 ◽

Cited By ~ 9

Author(s):

Nagesh Kancharla ◽

Saakshi Jalali ◽

J. Narasimham ◽

Vinod Nair ◽

Vijay Yepuri ◽

...

Keyword(s):

Ssr Markers ◽

Genome Assembly ◽

Jatropha Curcas ◽

Quantitative Trait ◽

De Novo ◽

Mapping Population ◽

Single Copy ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Sequencing Technologies

Jatropha curcas is an important perennial, drought tolerant plant that has been identified as a potential biodiesel crop. We report here the hybrid de novo genome assembly of J. curcas generated using Illumina and PacBio sequencing technologies, and identification of quantitative loci for Jatropha Mosaic Virus (JMV) resistance. In this study, we generated scaffolds of 265.7 Mbp in length, which correspond to 84.8% of the gene space, using Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis. Additionally, 96.4% of predicted protein-coding genes were captured in RNA sequencing data, which reconfirms the accuracy of the assembled genome. The genome was utilized to identify 12,103 dinucleotide simple sequence repeat (SSR) markers, which were exploited in genetic diversity analysis to identify genetically distinct lines. A total of 207 polymorphic SSR markers were employed to construct a genetic linkage map for JMV resistance, using an interspecific F2 mapping population involving susceptible J. curcas and resistant Jatropha integerrima as parents. Quantitative trait locus (QTL) analysis led to the identification of three minor QTLs for JMV resistance, and the same has been validated in an alternate F2 mapping population. These validated QTLs were utilized in marker-assisted breeding for JMV resistance. Comparative genomics of oil-producing genes across selected oil producing species revealed 27 conserved genes and 2986 orthologous protein clusters in Jatropha. This reference genome assembly gives an insight into the understanding of the complex genetic structure of Jatropha, and serves as source for the development of agronomically improved virus-resistant and oil-producing lines.

Download Full-text

De Novo Assembly of the Northern Cardinal (Cardinalis cardinalis) Genome Reveals Candidate Regulatory Regions for Sexually Dichromatic Red Plumage Coloration

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401373 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3541-3548

Author(s):

Simon Yung Wa Sin ◽

Lily Lu ◽

Scott V. Edwards

Keyword(s):

De Novo ◽

Draft Genome ◽

Single Copy ◽

Genomic Region ◽

Plumage Coloration ◽

Sexual Dichromatism ◽

Effective Population ◽

Cardinalis Cardinalis ◽

Northern Cardinal ◽

Genomic Studies

Northern cardinals (Cardinalis cardinalis) are common, mid-sized passerines widely distributed in North America. As an iconic species with strong sexual dichromatism, it has been the focus of extensive ecological and evolutionary research, yet genomic studies investigating the evolution of genotype–phenotype association of plumage coloration and dichromatism are lacking. Here we present a new, highly-contiguous assembly for C. cardinalis. We generated a 1.1 Gb assembly comprised of 4,762 scaffolds, with a scaffold N50 of 3.6 Mb, a contig N50 of 114.4 kb and a longest scaffold of 19.7 Mb. We identified 93.5% complete and single-copy orthologs from an Aves dataset using BUSCO, demonstrating high completeness of the genome assembly. We annotated the genomic region comprising the CYP2J19 gene, which plays a pivotal role in the red coloration in birds. Comparative analyses demonstrated non-exonic regions unique to the CYP2J19 gene in passerines and a long insertion upstream of the gene in C. cardinalis. Transcription factor binding motifs discovered in the unique insertion region in C. cardinalis suggest potential androgen-regulated mechanisms underlying sexual dichromatism. Pairwise Sequential Markovian Coalescent (PSMC) analysis of the genome reveals fluctuations in historic effective population size between 100,000–250,000 in the last 2 millions years, with declines concordant with the beginning of the Pleistocene epoch and Last Glacial Period. This draft genome of C. cardinalis provides an important resource for future studies of ecological, evolutionary, and functional genomics in cardinals and other birds.

Download Full-text

Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C

10.1101/2021.06.02.444084 ◽

2021 ◽

Author(s):

Stephanie H Chen ◽

Maurizio Rossetto ◽

Marlien van der Merwe ◽

Patricia Lu-Irving ◽

Jia-Yee S Yap ◽

...

Keyword(s):

Genome Size ◽

De Novo ◽

New South ◽

New South Wales ◽

Single Copy ◽

Size Estimation ◽

De Novo Genome Assembly ◽

South Wales ◽

Long Reads ◽

Chromosome Level

Background: Telopea speciosissima, the New South Wales waratah, is Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Findings: Here, we report the first chromosome-level reference genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 91.2 % of Embryophyta BUSCOs complete. We introduce a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering assembly scaffolds. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single copy orthologues and find that the assembly is 93.9 % of the estimated genome size. The largest 11 scaffolds contained 94.1 % of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. Our results indicate that the waratah genome is highly repetitive, with a repeat content of 62.3 %. Conclusions: The T. speciosissima genome (Tspe_v1) will accelerate waratah evolutionary genomics and facilitate marker assisted approaches for breeding. Broadly, it represents an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.

Download Full-text

De novo genome assembly of Geosmithia morbida, the causal agent of thousand cankers disease

10.7287/peerj.preprints.1671v1 ◽

2016 ◽

Author(s):

Taruna Aggarwal ◽

Anthony Westbrook ◽

Kirk Broders ◽

Keith Woeste ◽

Matthew D MacManes

Keyword(s):

Genome Assembly ◽

Fungal Pathogens ◽

De Novo ◽

Draft Genome ◽

Black Walnut ◽

De Novo Genome Assembly ◽

Evolutionary Mechanisms ◽

Geosmithia Morbida ◽

Thousand Cankers Disease ◽

Walnut Twig Beetle

Geosmithia morbida is a filamentous ascomycete that causes Thousand Cankers Disease in the eastern black walnut tree. This pathogen is commonly found in the western U.S.; however, recently the disease was also detected in several eastern states where the black walnut lumber industry is concentrated. G. morbida is one of two known phytopathogens within the genus Geosmithia, and it is vectored into the host tree via the walnut twig beetle. We present the first de novo draft genome of G. morbida. It is 26.5 Mbp in length and contains less than 1% repetitive elements. The genome possesses an estimated 6,273 genes, 277 of which are predicted to encode proteins with unknown functions. Approximately 31.5% of the proteins in G. morbida are homologous to proteins involved in pathogenicity, and 5.6% of the proteins contain signal peptides that indicate these proteins are secreted. Several studies have investigated the evolution of pathogenicity in pathogens of agricultural crops; forest fungal pathogens are often neglected because research efforts are focused on food crops. G. morbida is one of the few tree phytopathogens to be sequenced, assembled and annotated. The first draft genome of G. morbida serves as a valuable tool for comprehending the underlying molecular and evolutionary mechanisms behind pathogenesis within the Geosmithia genus. Keywords: de novo genome assembly, pathogenesis, forest pathogen, black walnut, walnut twig beetle.

Download Full-text

Insights from the first genome assembly of Onion (Allium cepa)

10.1101/2021.03.05.434149 ◽

2021 ◽

Author(s):

Richard Finkers ◽

Martijn P.W. van Kaauwen ◽

Kai Ament ◽

Karin Burger-Meijer ◽

Raymond J. Egging ◽

...

Keyword(s):

Ab Initio ◽

Genetic Linkage ◽

De Novo ◽

Gene Prediction ◽

Draft Genome ◽

Linkage Maps ◽

Vegetable Crop ◽

Putative Gene ◽

Final Assembly ◽

Genetic Linkage Maps

Onion is an important vegetable crop with an estimated genome size of 16GB. We describe the de novo assembly and ab initio annotation of the genome of a doubled haploid onion line DHCU066619, which resulted in a final assembly of 14.9 Gb with a N50 of 461 Kb. Of which 2.2 Gb was ordered into 8 pseudomolecules using five genetic linkage maps. The remainder of the genome is available in 89.8 K scaffolds. Analysis of this genome shows that at least 72.4% of the genome is repetitive and consists, to a large extent, of (retro) transposons. Many (retro) transposons were already quite old as they had accumulated many mutations, facilitating their assembly, however, hampering their identification. The draft ab initio gene prediction indicated 540 925 putative gene models, which is far more than expected, possibly due to the presence of pseudogenes. 86,073 models showed similarity to published proteins (UNIPROT). No gene rich regions were found, genes are uniformly distributed over the genome. Analysis of synteny with A. sativum (garlic) showed collinearity but also major rearrangements between both species. Not-withstanding, this assembly is the first high-quality draft genome sequence available for the study of onion and will be a valuable resource for further research.

Download Full-text

A de novo assembly of the sweet cherry (Prunus avium cv. Tieton) genome using linked-read sequencing technology

PeerJ ◽

10.7717/peerj.9114 ◽

2020 ◽

Vol 8 ◽

pp. e9114 ◽

Cited By ~ 1

Author(s):

Jiawei Wang ◽

Weizhen Liu ◽

Dongzi Zhu ◽

Xiang Zhou ◽

Po Hong ◽

...

Keyword(s):

Sweet Cherry ◽

Prunus Avium ◽

Reference Genome ◽

De Novo ◽

Draft Genome ◽

Single Copy ◽

Sequencing Data ◽

Sequencing Technology ◽

High Quality ◽

Eukaryotic Genes

The sweet cherry (Prunus avium) is one of the most economically important fruit species in the world. However, there is a limited amount of genetic information available for this species, which hinders breeding efforts at a molecular level. We were able to describe a high-quality reference genome assembly and annotation of the diploid sweet cherry (2n = 2x = 16) cv. Tieton using linked-read sequencing technology. We generated over 750 million clean reads, representing 112.63 GB of raw sequencing data. The Supernova assembler produced a more highly-ordered and continuous genome sequence than the current P. avium draft genome, with a contig N50 of 63.65 KB and a scaffold N50 of 2.48 MB. The final scaffold assembly was 280.33 MB in length, representing 82.12% of the estimated Tieton genome. Eight chromosome-scale pseudomolecules were constructed, completing a 214 MB sequence of the final scaffold assembly. De novo, homology-based, and RNA-seq methods were used together to predict 30,975 protein-coding loci. 98.39% of core eukaryotic genes and 97.43% of single copy orthologues were identified in the embryo plant, indicating the completeness of the assembly. Linked-read sequencing technology was effective in constructing a high-quality reference genome of the sweet cherry, which will benefit the molecular breeding and cultivar identification in this species.

Download Full-text

High contiguity de novo genome assembly and DNA modification analyses for the fungus fly, Sciara coprophila, using single-molecule sequencing

BMC Genomics ◽

10.1186/s12864-021-07926-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

John M. Urban ◽

Michael S. Foulk ◽

Jacob E. Bliss ◽

C. Michelle Coleman ◽

Nanyan Lu ◽

...

Keyword(s):

Genome Sequence ◽

Single Molecule ◽

De Novo ◽

Bacterial Genome ◽

Draft Genome ◽

Dna Amplification ◽

Chromosome Elimination ◽

Paternal Chromosome ◽

De Novo Genome Assembly ◽

Long Read

Abstract Background The lower Dipteran fungus fly, Sciara coprophila, has many unique biological features that challenge the rule of genome DNA constancy. For example, Sciara undergoes paternal chromosome elimination and maternal X chromosome nondisjunction during spermatogenesis, paternal X elimination during embryogenesis, intrachromosomal DNA amplification of DNA puff loci during larval development, and germline-limited chromosome elimination from all somatic cells. Paternal chromosome elimination in Sciara was the first observation of imprinting, though the mechanism remains a mystery. Here, we present the first draft genome sequence for Sciara coprophila to take a large step forward in addressing these features. Results We assembled the Sciara genome using PacBio, Nanopore, and Illumina sequencing. To find an optimal assembly using these datasets, we generated 44 short-read and 50 long-read assemblies. We ranked assemblies using 27 metrics assessing contiguity, gene content, and dataset concordance. The highest-ranking assemblies were scaffolded using BioNano optical maps. RNA-seq datasets from multiple life stages and both sexes facilitated genome annotation. A set of 66 metrics was used to select the first draft assembly for Sciara. Nearly half of the Sciara genome sequence was anchored into chromosomes, and all scaffolds were classified as X-linked or autosomal by coverage. Conclusions We determined that X-linked genes in Sciara males undergo dosage compensation. An entire bacterial genome from the Rickettsia genus, a group known to be endosymbionts in insects, was co-assembled with the Sciara genome, opening the possibility that Rickettsia may function in sex determination in Sciara. Finally, the signal level of the PacBio and Nanopore data support the presence of cytosine and adenine modifications in the Sciara genome, consistent with a possible role in imprinting.

Download Full-text

Solyntus, the New Highly Contiguous Reference Genome for Potato (Solanum tuberosum)

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401550 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3489-3495

Author(s):

Natascha van Lieshout ◽

Ate van der Burgt ◽

Michiel E. de Vries ◽

Menno ter Maat ◽

David Eickholt ◽

...

Keyword(s):

Solanum Tuberosum ◽

Reference Genome ◽

De Novo ◽

Draft Genome ◽

Single Copy ◽

Rapid Expansion ◽

Potato Genome ◽

Homozygous Diploid ◽

Gene Orthologs ◽

Reference Genomes

With the rapid expansion of the application of genomics and sequencing in plant breeding, there is a constant drive for better reference genomes. In potato (Solanum tuberosum), the third largest food crop in the world, the related species S. phureja, designated “DM”, has been used as the most popular reference genome for the last 10 years. Here, we introduce the de novo sequenced genome of Solyntus as the next standard reference in potato genome studies. A true Solanum tuberosum made up of 116 contigs that is also highly homozygous, diploid, vigorous and self-compatible, Solyntus provides a more direct and contiguous reference then ever before available. It was constructed by sequencing with state-of-the-art long and short read technology and assembled with Canu. The 116 contigs were assembled into scaffolds to form each pseudochromosome, with three contigs to 17 contigs per chromosome. This assembly contains 93.7% of the single-copy gene orthologs from the Solanaceae set and has an N50 of 63.7 Mbp. The genome and related files can be found at https://www.plantbreeding.wur.nl/Solyntus/. With the release of this research line and its draft genome we anticipate many exciting developments in (diploid) potato research.

Download Full-text

LongStitch: High-quality genome assembly correction and scaffolding using long reads

10.1101/2021.06.17.448848 ◽

2021 ◽

Author(s):

Lauren Coombe ◽

Janet X Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text