High-quality Schistosoma haematobium genome achieved by single-molecule and long-range sequencing

AbstractBackgroundSchistosoma haematobium causes urogenital schistosomiasis, a neglected tropical disease affecting >100 million people worldwide. Chronic infection with this parasitic trematode can lead to urogenital conditions including female genital schistosomiasis and bladder cancer. At the molecular level, little is known about this blood fluke and the pathogenesis of the disease that it causes. To support molecular studies of this carcinogenic worm, we reported a draft genome for S. haematobium in 2012. Although a useful resource, its utility has been somewhat limited by its fragmentation.FindingsHere, we systematically enhanced the draft genome of S. haematobium using a single-molecule and long-range DNA-sequencing approach. We achieved a major improvement in the accuracy and contiguity of the genome assembly, making it superior or comparable to assemblies for other schistosome species. We transferred curated gene models to this assembly and, using enhanced gene annotation pipelines, inferred a gene set with as many or more complete gene models as those of other well-studied schistosomes. Using conserved, single-copy orthologs, we assessed the phylogenetic position of S. haematobium in relation to other parasitic flatworms for which draft genomes were available.ConclusionsWe report a substantially enhanced genomic resource that represents a solid foundation for molecular research on S. haematobium and is poised to better underpin population and functional genomic investigations and to accelerate the search for new disease interventions.

Download Full-text

De Novo Whole-Genome Sequencing of the Wood Rot Fungus Polyporus brumalis, Which Exhibits Potential Terpenoid Metabolism

Genome Announcements ◽

10.1128/genomea.00586-17 ◽

2017 ◽

Vol 5 (28) ◽

Author(s):

Su-Yeon Lee ◽

Ji-eun An ◽

Sun-Hwa Ryu ◽

Myungkil Kim

Keyword(s):

Single Molecule ◽

De Novo ◽

Gene Annotation ◽

Draft Genome ◽

Fungal Growth ◽

Protein Coding ◽

Sequencing Platform ◽

Protein Coding Genes ◽

Polyporus Brumalis ◽

Terpenoid Metabolism

ABSTRACT Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis.

Download Full-text

Analyzing and characterization of the chloroplast genome of Salix suchowensis

10.7287/peerj.preprints.2388 ◽

2016 ◽

Author(s):

Congrui Sun ◽

Jie Li ◽

Xiaogang Dai ◽

Yingnan Chen

Keyword(s):

Tandem Repeats ◽

Gene Annotation ◽

Repetitive Sequences ◽

Single Copy ◽

Phylogenetic Position ◽

Shrub Willow ◽

Protein Coding ◽

Protein Coding Genes ◽

Cp Genome ◽

Rna Genes

By screening sequence reads from the chloroplast (cp) genome of S. suchowensis that generated by the next generation sequencing platforms, we built the complete circular pseudomolecule for its cp genome. This pseudomolecule is 155,508 bp in length, which has a typical quadripartite structure containing two single copy regions, a large single copy region (LSC 84,385 bp), and a small single copy region (SSC 16,209 bp) separated by inverted repeat regions (IRs 27,457 bp). Gene annotation revealed that the cp genome of S. suchowensis encoded 119 unique genes, including 4 ribosome RNA genes, 30 transfer RNA genes, 82 protein-coding genes and 3 pseudogenes. Analyzing the repetitive sequences detected 15 tandem repeats, 16 forward repeats and 5 palindromic repeats. In addition, a total of 188 perfect microsatellites were detected, which were characterized as A/T predominance in nucleotide compositions. Significant shifting of the IR/SSC boundaries was revealed by comparing this cp genome with that of other rosids plants. We also built phylogenetic trees to demonstrate the phylogenetic position of S. suchowensis in Rosidae, with 66 orthologous protein-coding genes presented in the cp genomes of 32 species. By sequencing 30 amplicons based on the pseudomolecule, experimental verification achieved accuracy up to 99.84% for the cp genome assembly of S. suchowensis. In conclusion, this study built a high quality pseudomolecule for the cp genome of S. suchowensis, which is a useful resource for facilitating the development of this shrub willow into a more productive bioenergy crop.

Download Full-text

Genome sequence of the agarwood tree Aquilaria sinensis (Lour.) Spreng: the first chromosome-level draft genome in the Thymelaeceae family

GigaScience ◽

10.1093/gigascience/giaa013 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 1

Author(s):

Xupo Ding ◽

Wenli Mei ◽

Qiang Lin ◽

Hao Wang ◽

Jun Wang ◽

...

Keyword(s):

Genome Assembly ◽

Gene Annotation ◽

Draft Genome ◽

Single Copy ◽

Aquilaria Sinensis ◽

Final Size ◽

Plant Resources ◽

Protein Coding ◽

High Level ◽

Chromosome Level

Abstract Backgroud Aquilaria sinensis (Lour.) Spreng is one of the important plant resources involved in the production of agarwood in China. The agarwood resin collected from wounded Aquilaria trees has been used in Asia for aromatic or medicinal purposes from ancient times, although the mechanism underlying the formation of agarwood still remains poorly understood owing to a lack of accurate and high-quality genetic information. Findings We report the genomic architecture of A. sinensis by using an integrated strategy combining Nanopore, Illumina, and Hi-C sequencing. The final genome was ∼726.5 Mb in size, which reached a high level of continuity and a contig N50 of 1.1 Mb. We combined Hi-C data with the genome assembly to generate chromosome-level scaffolds. Eight super-scaffolds corresponding to the 8 chromosomes were assembled to a final size of 716.6 Mb, with a scaffold N50 of 88.78 Mb using 1,862 contigs. BUSCO evaluation reveals that the genome completeness reached 95.27%. The repeat sequences accounted for 59.13%, and 29,203 protein-coding genes were annotated in the genome. According to phylogenetic analysis using single-copy orthologous genes, we found that A. sinensis is closely related to Gossypium hirsutum and Theobroma cacao from the Malvales order, and A. sinensis diverged from their common ancestor ∼53.18–84.37 million years ago. Conclusions Here, we present the first chromosome-level genome assembly and gene annotation of A. sinensis. This study should contribute to valuable genetic resources for further research on the agarwood formation mechanism, genome-assisted improvement, and conservation biology of Aquilaria species.

Download Full-text

Genome Sequencing of Paecilomyces Penicillatus Provides Insights into Its Phylogenetic Placement and Mycoparasitism Mechanisms on Morel Mushrooms

Pathogens ◽

10.3390/pathogens9100834 ◽

2020 ◽

Vol 9 (10) ◽

pp. 834

Author(s):

Xinxin Wang ◽

Jingyu Peng ◽

Lei Sun ◽

Gregory Bonito ◽

Yuxiu Guo ◽

...

Keyword(s):

Single Molecule ◽

Molecular Mechanisms ◽

Draft Genome ◽

Gene Clusters ◽

Single Copy ◽

Dual Culture ◽

Edible Fungi ◽

Fungal Cell ◽

Sequencing Platform ◽

Phylogenetic Placement

Morels (Morchella spp.) are popular edible fungi with significant economic and scientific value. However, white mold disease, caused by Paecilomyces penicillatus, can reduce morel yield by up to 80% in the main cultivation area in China. Paecilomyces is a polyphyletic genus and the exact phylogenetic placement of P. penicillatus is currently still unclear. Here, we obtained the first high-quality genome sequence of P. penicillatus generated through the single-molecule real-time (SMRT) sequencing platform. The assembled draft genome of P. penicillatus was 40.2 Mb, had an N50 value of 2.6 Mb and encoded 9454 genes. Phylogenetic analysis of single-copy orthologous genes revealed that P. penicillatus is in Hypocreales and closely related to Hypocreaceae, which includes several genera exhibiting a mycoparasitic lifestyle. CAZymes analysis demonstrated that P. penicillatus encodes a large number of fungal cell wall degradation enzymes. We identified many gene clusters involved in the production of secondary metabolites known to exhibit antifungal, antibacterial, or insecticidal activities. We further demonstrated through dual culture assays that P. penicillatus secretes certain soluble compounds that are inhibitory to the mycelial growth of Morchella sextelata. This study provides insights into the correct phylogenetic placement of P. penicillatus and the molecular mechanisms that underlie P. penicillatus pathogenesis.

Download Full-text

Homology-guided re-annotation improves the gene models of the alloploid Nicotiana benthamiana

10.1101/373506 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jiorgos Kourelis ◽

Farnusch Kaschani ◽

Friederike M. Grosse-Holz ◽

Felix Homma ◽

Markus Kaiser ◽

...

Keyword(s):

Nicotiana Benthamiana ◽

Gene Annotation ◽

Draft Genome ◽

Model Organism ◽

Extracellular Proteins ◽

Functional Annotations ◽

Draft Genome Assembly ◽

Protein Encoding ◽

Gene Models ◽

Encoding Genes

Nicotiana benthamiana is an important model organism of the Solanaceae (Nightshade) family. Several draft assemblies of the N. benthamiana genome have been generated, but many of the gene-models in these draft assemblies appear incorrect. Here we present an improved re-annotation of the Niben1.0.1 draft genome assembly guided by gene models from other Nicotiana species. This approach overcomes problems caused by mis-annotated exon-intron boundaries and mis-assigned short read transcripts to homeologs in polyploid genomes. With an estimated 98.1% completeness; only 53,411 protein-encoding genes; and improved protein lengths and functional annotations, this new predicted proteome is better than the preceding proteome annotations. This dataset is more sensitive and accurate in proteomics applications, clarifying the detection by activity-based proteomics of proteins that were previously mis-annotated to be inactive. Phylogenetic analysis of the subtilase family of hydrolases reveal a pseudogenisation of likely homeologs, associated with a contraction of the functional genome in this alloploid plant species. We use this gene annotation to assign extracellular proteins in comparison to a total leaf proteome, to display the enrichment of hydrolases in the apoplast.

Download Full-text

The First Draft Genome of the Plasterer Bee Colletes gigas (Hymenoptera: Colletidae: Colletes)

Genome Biology and Evolution ◽

10.1093/gbe/evaa090 ◽

2020 ◽

Vol 12 (6) ◽

pp. 860-866 ◽

Cited By ~ 1

Author(s):

Qing-Song Zhou ◽

Arong Luo ◽

Feng Zhang ◽

Ze-Qing Niu ◽

Qing-Tao Wu ◽

...

Keyword(s):

Single Molecule ◽

Draft Genome ◽

Olfactory Receptors ◽

Gene Families ◽

Single Copy ◽

Gene Family Evolution ◽

Nesting Biology ◽

Protein Coding ◽

Final Assembly ◽

Long Reads

Abstract Despite intense interest in bees, no genomes are available for the bee family Colletidae. Colletes gigas, one of the largest species of the genus Colletes in the world, is an ideal candidate to fill this gap. Endemic to China, C. gigas has been the focus of studies on its nesting biology and pollination of the economically important oil tree Camellia oleifera, which is chemically defended. To enable deeper study of its biology, we sequenced the whole genome of C. gigas using single-molecule real-time sequencing on the Pacific Bioscience Sequel platform. In total, 40.58 G (150×) of long reads were generated and the final assembly of 326 scaffolds was 273.06 Mb with a N50 length of 8.11 Mb, which captured 94.4% complete Benchmarking Universal Single-Copy Orthologs. We predicted 11,016 protein-coding genes, of which 98.50% and 84.75% were supported by protein- and transcriptome-based evidence, respectively. In addition, we identified 26.27% of repeats and 870 noncoding RNAs. The bee phylogeny with this newly sequenced colletid genome is consistent with available results, supporting Colletidae as sister to Halictidae when Stenotritidae is not included. Gene family evolution analyses identified 9,069 gene families, of which 70 experienced significant expansions (33 families) or contractions (37 families), and it appears that olfactory receptors and carboxylesterase may be involved in specializing on and detoxifying Ca. oleifera pollen. Our high-quality draft genome for C. gigas lays the foundation for insights on the biology and behavior of this species, including its evolutionary history, nesting biology, and interactions with the plant Ca. oleifera.

Download Full-text

The First Draft Genome Assembly of Snow Sheep (Ovis nivicola)

Genome Biology and Evolution ◽

10.1093/gbe/evaa124 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1330-1336 ◽

Cited By ~ 2

Author(s):

Maulik Upadhyay ◽

Andreas Hauser ◽

Elisabeth Kunz ◽

Stefan Krebs ◽

Helmut Blum ◽

...

Keyword(s):

De Novo ◽

Gene Annotation ◽

Gene Prediction ◽

Repetitive Sequences ◽

Draft Genome ◽

Single Copy ◽

Climatic Conditions ◽

Draft Genome Assembly ◽

Sheep Genome ◽

Long Reads

Abstract The snow sheep, Ovis nivicola, which is endemic to the mountain ranges of northeastern Siberia, are well adapted to the harsh cold climatic conditions of their habitat. In this study, using long reads of Nanopore sequencing technology, whole-genome sequencing, assembly, and gene annotation of a snow sheep were carried out. Additionally, RNA-seq reads from several tissues were also generated to supplement the gene prediction in snow sheep genome. The assembled genome was ∼2.62 Gb in length and was represented by 7,157 scaffolds with N50 of about 2 Mb. The repetitive sequences comprised of 41% of the total genome. BUSCO analysis revealed that the snow sheep assembly contained full-length or partial fragments of 97% of mammalian universal single-copy orthologs (n = 4,104), illustrating the completeness of the assembly. In addition, a total of 20,045 protein-coding sequences were identified using comprehensive gene prediction pipeline. Of which 19,240 (∼96%) sequences were annotated using protein databases. Moreover, homology-based searches and de novo identification detected 1,484 tRNAs; 243 rRNAs; 1,931 snRNAs; and 782 miRNAs in the snow sheep genome. To conclude, we generated the first de novo genome of the snow sheep using long reads; these data are expected to contribute significantly to our understanding related to evolution and adaptation within the Ovis genus.

Download Full-text

Datura genome reveals duplications of psychoactive alkaloid biosynthetic genes and high mutation rate following tissue culture

BMC Genomics ◽

10.1186/s12864-021-07489-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alex Rajewski ◽

Derreck Carter-House ◽

Jason Stajich ◽

Amy Litt

Keyword(s):

Tissue Culture ◽

Gene Annotation ◽

Draft Genome ◽

Tropane Alkaloids ◽

Datura Stramonium ◽

Single Copy ◽

Tropane Alkaloid ◽

Biosynthetic Genes ◽

Protein Coding ◽

Culture Process

Abstract Background Datura stramonium (Jimsonweed) is a medicinally and pharmaceutically important plant in the nightshade family (Solanaceae) known for its production of various toxic, hallucinogenic, and therapeutic tropane alkaloids. Recently, we published a tissue-culture based transformation protocol for D. stramonium that enables more thorough functional genomics studies of this plant. However, the tissue culture process can lead to undesirable phenotypic and genomic consequences independent of the transgene used. Here, we have assembled and annotated a draft genome of D. stramonium with a focus on tropane alkaloid biosynthetic genes. We then use mRNA sequencing and genome resequencing of transformants to characterize changes following tissue culture. Results Our draft assembly conforms to the expected 2 gigabasepair haploid genome size of this plant and achieved a BUSCO score of 94.7% complete, single-copy genes. The repetitive content of the genome is 61%, with Gypsy-type retrotransposons accounting for half of this. Our gene annotation estimates the number of protein-coding genes at 52,149 and shows evidence of duplications in two key alkaloid biosynthetic genes, tropinone reductase I and hyoscyamine 6 β-hydroxylase. Following tissue culture, we detected only 186 differentially expressed genes, but were unable to correlate these changes in expression with either polymorphisms from resequencing or positional effects of transposons. Conclusions We have assembled, annotated, and characterized the first draft genome for this important model plant species. Using this resource, we show duplications of genes leading to the synthesis of the medicinally important alkaloid, scopolamine. Our results also demonstrate that following tissue culture, mutation rates of transformed plants are quite high (1.16 × 10− 3 mutations per site), but do not have a drastic impact on gene expression.

Download Full-text

Analyzing and characterization of the chloroplast genome of Salix suchowensis

10.7287/peerj.preprints.2388v1 ◽

2016 ◽

Author(s):

Congrui Sun ◽

Jie Li ◽

Xiaogang Dai ◽

Yingnan Chen

Keyword(s):

Tandem Repeats ◽

Gene Annotation ◽

Repetitive Sequences ◽

Single Copy ◽

Phylogenetic Position ◽

Shrub Willow ◽

Protein Coding ◽

Protein Coding Genes ◽

Cp Genome ◽

Rna Genes

Download Full-text

Near-chromosome level genome assembly of the fruit pest Drosophila suzukii using long-read sequencing

10.1101/2020.01.02.892844 ◽

2020 ◽

Cited By ~ 3

Author(s):

Mathilde Paris ◽

Roxane Boyer ◽

Rita Jaenichen ◽

Jochen Wolf ◽

Marianthi Karageorgi ◽

...

Keyword(s):

Genome Assembly ◽

High Throughput Sequencing ◽

Gene Annotation ◽

Draft Genome ◽

Sequencing Data ◽

Drosophila Suzukii ◽

Sequence Coverage ◽

Research Activities ◽

Long Read ◽

Genomic Resource

ABSTRACTOver the past decade, the spotted wing Drosophila, Drosophila suzukii, has invaded Europe and America and has become a major agricultural pest in these areas, thereby prompting intense research activities to better understand its biology. Two draft genome assemblies based on short-read sequencing were released in 2013 for this species. Although valuable, these resources contain pervasive assembly errors and are highly fragmented, two features limiting their values. Our purpose here was to improve the assembly of the D. suzukii genome. For this, we generated PacBio long-read sequencing data at 160X sequence coverage and assembled a novel, contiguous D. suzukii genome. We obtained a high-quality assembly of 270 Mb (with 546 contigs, a N50 of 2.6Mb, a L50 of 15, and a BUSCO score of 95%) that we called WT3-2.0. We found that despite 16 rounds of full-sib crossings the D. suzukii strain that we sequenced has maintained high levels of polymorphism in some regions of its genome (ca. 19Mb). As a consequence, the quality of the assembly of these regions was reduced. We explored possible origins of this high residual diversity, including the presence of structural variants and a possible heterogeneous admixture pattern of North American and Asian ancestry. Overall, our WT3-2.0 assembly provides a higher quality genomic resource compared to the previous one in terms of general assembly statistics, sequence quality and gene annotation. This new D. suzukii genome assembly is therefore an improved resource for high-throughput sequencing approaches, as well as manipulative genetic technologies to study D. suzukii.

Download Full-text