scholarly journals De novo genome assembly of the land snail Candidula unifasciata (Mollusca: Gastropoda)

Author(s):  
Luis J Chueca ◽  
Tilman Schell ◽  
Markus Pfenninger

Abstract Among all molluscs, land snails are a scientifically and economically interesting group comprising edible species, alien species and agricultural pests. Yet, despite their high diversity, the number of genome drafts publicly available is still scarce. Here, we present the draft genome assembly of the land snail Candidula unifasciata, a widely distributed species along central Europe, belonging to the Geomitridae family, a highly diversified taxon in the Western-Palearctic region. We performed whole genome sequencing, assembly and annotation of an adult specimen based on PacBio and Oxford Nanopore long read sequences as well as Illumina data. A genome draft of about 1.29 Gb was generated with a N50 length of 246 kb. More than 60% of the assembled genome was identified as repetitive elements. 22,464 protein-coding genes were identified in the genome, of which 62.27% were functionally annotated. This is the first assembled and annotated genome for a geometrid snail and will serve as reference for further evolutionary, genomic and population genetic studies of this important and interesting group.

2021 ◽  
Author(s):  
Luis J. Chueca ◽  
Tilman Schell ◽  
Markus Pfenninger

AbstractAmong all molluscs, land snails are an economically and scientifically interesting group comprising edible species, alien species and agricultural pests. Yet, despite its high diversity, the number of whole genomes publicly available is still scarce. Here, we present the draft genome assembly of the land snail Candidula unifasciata, a widely distributed species along central Europe, which belongs to Geomitridae family, a group highly diversified in the Western-Palearctic region. We performed a whole genome sequencing, assembly and annotation of an adult specimen based on PacBio and Oxford Nanopore long read sequences as well as Illumina data. A genome of about 1.29 Gb was generated with a N50 length of 246 kb. More than 60% of the assembled genome was identified as repetitive elements, and 22,464 protein-coding genes were identified in the genome, where the 62.27% were functionally annotated. This is the first assembled and annotated genome for a geometrid snail and will serve as reference for further evolutionary, genomic and population genetic studies of this important and interesting group.


2020 ◽  
Author(s):  
Jan O. Engler ◽  
Yvonne Lawrie ◽  
Yannick Gansemans ◽  
Filip Van Nieuwerburgh ◽  
Alexander Suh ◽  
...  

AbstractThe Taita White-eye (Zosterops silvanus) is an endangered songbird endemic to the Taita Hills of Southern Kenya, where it is confined to small areas of fragmented forest. With diversification rates exceeding those reported in most other vertebrates, White-eyes are a prime example of a ‘great speciator’. Nevertheless, we still know surprisingly little about the genomic underpinnings leading to this extraordinary fast radiation. Here, we present a draft genome assembly (ZSil_MB_1.0) for the Taita White-eye generated from a blood sample of a wild, female bird captured in the Taita Hills, Kenya. By performing a de novo assembly with linked-reads and annotation of the assembly with the MAKER pipeline, we generated a 1.069 Gb assembly with a scaffold N50 of 1.105 Mb and an L50 of 244. After quality evaluation of the assembly, we identified 92.1% of BUSCOs complete or fragmented, indicating that our de novo assembly is of high quality. This new assembly provides a genomic resource for future studies into the evolutionary and comparative genomics of this rapidly diversifying group of birds.


BioTechniques ◽  
2021 ◽  
Author(s):  
Janneke Aylward ◽  
Michael J Wingfield ◽  
Francois Roets ◽  
Brenda D Wingfield

Contamination in sequenced genomes is a relatively common problem and several methods to remove non-target sequences have been devised. Typically, the target and contaminating organisms reside in different kingdoms, simplifying their separation. The authors present the case of a genome for the ascomycete fungus Teratosphaeria eucalypti, contaminated by another ascomycete fungus and a bacterium. Approaching the problem as a low-complexity metagenomics project, the authors used two available software programs, BlobToolKit and anvi'o, to filter the contaminated genome. Both the de novo and reference-assisted approaches yielded a high-quality draft genome assembly for the target fungus. Incorporating reference sequences increased assembly completeness and visualization elucidated previously unknown genome features. The authors suggest that visualization should be routine in any sequencing project, regardless of suspected contamination.


2020 ◽  
Vol 12 (2) ◽  
pp. 3917-3925
Author(s):  
Greer A Dolby ◽  
Matheo Morales ◽  
Timothy H Webster ◽  
Dale F DeNardo ◽  
Melissa A Wilson ◽  
...  

Abstract Toll-like receptors (TLRs) are a complex family of innate immune genes that are well characterized in mammals and birds but less well understood in nonavian sauropsids (reptiles). The advent of highly contiguous draft genomes of nonmodel organisms enables study of such gene families through analysis of synteny and sequence identity. Here, we analyze TLR genes from the genomes of 22 tetrapod species. Findings reveal a TLR8 gene expansion in crocodilians and turtles (TLR8B), and a second duplication (TLR8C) specifically within turtles, followed by pseudogenization of that gene in the nonfreshwater species (desert tortoise and green sea turtle). Additionally, the Mojave desert tortoise (Gopherus agassizii) has a stop codon in TLR8B (TLR8-1) that is polymorphic among conspecifics. Revised orthology further reveals a new TLR homolog, TLR21-like, which is exclusive to lizards, snakes, turtles, and crocodilians. These analyses were made possible by a new draft genome assembly of the desert tortoise (gopAga2.0), which used chromatin-based assembly to yield draft chromosomal scaffolds (L50 = 26 scaffolds, N50 = 28.36 Mb, longest scaffold = 107 Mb) and an enhanced de novo genome annotation with 25,469 genes. Our three-step approach to orthology curation and comparative analysis of TLR genes shows what new insights are possible using genome assemblies with chromosome-scale scaffolds that permit integration of synteny conservation data.


2016 ◽  
Author(s):  
Taruna Aggarwal ◽  
Anthony Westbrook ◽  
Kirk Broders ◽  
Keith Woeste ◽  
Matthew D MacManes

Geosmithia morbida is a filamentous ascomycete that causes Thousand Cankers Disease in the eastern black walnut tree. This pathogen is commonly found in the western U.S.; however, recently the disease was also detected in several eastern states where the black walnut lumber industry is concentrated. G. morbida is one of two known phytopathogens within the genus Geosmithia, and it is vectored into the host tree via the walnut twig beetle. We present the first de novo draft genome of G. morbida. It is 26.5 Mbp in length and contains less than 1% repetitive elements. The genome possesses an estimated 6,273 genes, 277 of which are predicted to encode proteins with unknown functions. Approximately 31.5% of the proteins in G. morbida are homologous to proteins involved in pathogenicity, and 5.6% of the proteins contain signal peptides that indicate these proteins are secreted. Several studies have investigated the evolution of pathogenicity in pathogens of agricultural crops; forest fungal pathogens are often neglected because research efforts are focused on food crops. G. morbida is one of the few tree phytopathogens to be sequenced, assembled and annotated. The first draft genome of G. morbida serves as a valuable tool for comprehending the underlying molecular and evolutionary mechanisms behind pathogenesis within the Geosmithia genus. Keywords: de novo genome assembly, pathogenesis, forest pathogen, black walnut, walnut twig beetle.


Author(s):  
Xinhai Ye ◽  
Yi Yang ◽  
Zhaoyang Tian ◽  
Le Xu ◽  
Kaili Yu ◽  
...  

AbstractSequencing and assembling a genome with a single individual have several advantages, such as lower heterozygosity and easier sample preparation. However, the amount of genomic DNA of some small sized organisms might not meet the standard DNA input requirement for current sequencing pipelines. Although few studies sequenced a single small insect with about 100 ng DNA as input, it may still be challenging for many small organisms to obtain such amount of DNA from a single individual. Here, we use 20 ng DNA as input, and present a high-quality genome assembly for a single haploid male parasitoid wasp (Habrobracon hebetor) using Nanopore and Illumina. Because of the low input DNA, a whole genome amplification (WGA) method is used before sequencing. The assembled genome size is 131.6 Mb with a contig N50 of 1.63 Mb. A total of 99% Benchmarking Universal Single-Copy Orthologs are detected, suggesting the high level of completeness of the genome assembly. Genome comparison between H. hebetor and its relative Bracon brevicornis shows a high-level genome synteny, indicating the genome of H. hebetor is highly accurate and contiguous. Our study provides an example for de novo assembling a genome from ultra-low input DNA, and will be used for sequencing projects of small sized species and rare samples, haploid genomics as well as population genetics of small sized species.


2019 ◽  
Author(s):  
Haley Wight ◽  
Junhui Zhou ◽  
Muzi Li ◽  
Sridhar Hannenhalli ◽  
Stephen M. Mount ◽  
...  

AbstractThe red raspberry, Rubus idaeus, is widely distributed in all temperate regions of Europe, Asia, and North America and is a major commercial fruit valued for its taste, high antioxidant and vitamin content. However, Rubus breeding is a long and slow process hampered by limited genomic and molecular resources. Genomic resources such as a complete genome sequencing and transcriptome will be of exceptional value to improve research and breeding of this high value crop. Using a hybrid sequence assembly approach including data from both long and short sequence reads, we present the first assembly of the Rubus idaeus genome (Joan J. variety). The de novo assembled genome consists of 2,145 scaffolds with a genome completeness of 95.3% and an N50 score of 638 KB. Leveraging a linkage map, we anchored 80.1% of the genome onto seven chromosomes. Using over 1 billion paired-end RNAseq reads, we annotated 35,566 protein coding genes with a transcriptome completeness score of 97.2%. The Rubus idaeus genome provides an important new resource for researchers and breeders.


2018 ◽  
Vol 6 (16) ◽  
pp. e00265-18 ◽  
Author(s):  
Stewart T. G. Burgess ◽  
Kathryn Bartley ◽  
Edward J. Marr ◽  
Harry W. Wright ◽  
Robert J. Weaver ◽  
...  

ABSTRACT Sheep scab, caused by infestation with Psoroptes ovis, is highly contagious, results in intense pruritus, and represents a major welfare and economic concern. Here, we report the first draft genome assembly and gene prediction of P. ovis based on PacBio de novo sequencing. The ∼63.2-Mb genome encodes 12,041 protein-coding genes.


2021 ◽  
Author(s):  
Lauren Coombe ◽  
Janet X Li ◽  
Theodora Lo ◽  
Johnathan Wong ◽  
Vladimir Nikolic ◽  
...  

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.


F1000Research ◽  
2020 ◽  
Vol 7 ◽  
pp. 1310
Author(s):  
Slimane Khayi ◽  
Nour Elhouda Azza ◽  
Fatima Gaboun ◽  
Stacy Pirro ◽  
Oussama Badad ◽  
...  

Background: The Argane tree ( Argania spinosa L. Skeels) is an endemic tree of mid-western Morocco that plays an important socioeconomic and ecologic role for a dense human population in an arid zone. Several studies confirmed the importance of this species as a food and feed source and as a resource for both pharmaceutical and cosmetic compounds. Unfortunately, the argane tree ecosystem is facing significant threats from environmental changes (global warming, over-population) and over-exploitation. Limited research has been conducted, however, on argane tree genetics and genomics, which hinders its conservation and genetic improvement. Methods: Here, we present a draft genome assembly of A. spinosa. A reliable reference genome of  A. spinosa was created using a hybrid  de novo assembly approach combining short and long sequencing reads. Results: In total, 144 Gb Illumina HiSeq reads and 7.6 Gb PacBio reads were produced and assembled. The final draft genome comprises 75 327 scaffolds totaling 671 Mb with an N50 of 49 916 kb. The draft assembly is close to the genome size estimated by k-mers distribution and covers 89% of complete and 4.3 % of partial Arabidopsis orthologous groups in BUSCO. Conclusion: The A. spinosa genome will be useful for assessing biodiversity leading to efficient conservation of this endangered endemic tree. Furthermore, the genome may enable genome-assisted cultivar breeding, and provide a better understanding of important metabolic pathways and their underlying genes for both cosmetic and pharmacological.


Sign in / Sign up

Export Citation Format

Share Document