Molecular mechanisms behind global distribution of earthworm revealed by the genome

AbstractEarthworms (Annelida: Crassiclitellata), are widely distributed around the world due to their great adaptability. However, lack of a high-quality genome sequence prevents gaining the many insights into physiology, phylogeny, and genome evolution that could come from a good earthworm genome. Herein, we report a complete genome assembly of the earthworm Amynthas corticis of about 1.2 Gb, based on a strategy combining third-generation long-read sequencing and Hi-C mapping. A total of 29,256 protein-coding genes are annotated in this genome. Analysis of resequencing data indicates that this earthworm is a triploid species. Furthermore, gene family evolution analysis shows that comprehensive expansion of gene families in the earthworm genome has produced more defensive functions compared with other species in Annelida. Quantitative proteomic iTRAQ analysis shows 97 immune related proteins and 16S rDNA sequences shows 88 microbes with significantly response to pathogenic Escherichia coli O157:H7. Our genome assembly provides abundant and valuable resources for the earthworm research community, serving as a first step toward uncovering the mysteries of this species, may explain its powerful defensive functions adapt to complex environment and invasion from molecular level.

Download Full-text

Amynthas corticis genome reveals molecular mechanisms behind global distribution

Communications Biology ◽

10.1038/s42003-021-01659-4 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Xing Wang ◽

Yi Zhang ◽

Yufeng Zhang ◽

Mingming Kang ◽

Yuanbo Li ◽

...

Keyword(s):

Genome Assembly ◽

Molecular Mechanisms ◽

Gene Families ◽

The Body ◽

Gene Family Evolution ◽

Complex Environments ◽

Protein Coding ◽

Itraq Analysis ◽

Rdna Sequencing ◽

Long Read

AbstractEarthworms (Annelida: Crassiclitellata) are widely distributed around the world due to their ancient origination as well as adaptation and invasion after introduction into new habitats over the past few centuries. Herein, we report a 1.2 Gb complete genome assembly of the earthworm Amynthas corticis based on a strategy combining third-generation long-read sequencing and Hi-C mapping. A total of 29,256 protein-coding genes are annotated in this genome. Analysis of resequencing data indicates that this earthworm is a triploid species. Furthermore, gene family evolution analysis shows that comprehensive expansion of gene families in the Amynthas corticis genome has produced more defensive functions compared with other species in Annelida. Quantitative proteomic iTRAQ analysis shows that expression of 147 proteins changed in the body of Amynthas corticis and 16 S rDNA sequencing shows that abundance of 28 microorganisms changed in the gut of Amynthas corticis when the earthworm was incubated with pathogenic Escherichia coli O157:H7. Our genome assembly provides abundant and valuable resources for the earthworm research community, serving as a first step toward uncovering the mysteries of this species, and may provide molecular level indicators of its powerful defensive functions, adaptation to complex environments and invasion ability.

Download Full-text

SMRT sequencing yields the chromosome-scale reference genome of tea tree, Camellia sinensis var. sinensis

10.1101/2020.01.02.892430 ◽

2020 ◽

Cited By ~ 1

Author(s):

Qun-Jie Zhang ◽

Wei Li ◽

Kui Li ◽

Hong Nan ◽

Cong Shi ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Chromosome Length ◽

Smrt Sequencing ◽

Protein Coding ◽

Tea Tree ◽

Long Read

AbstractTea is the oldest and most popular nonalcoholic beverage consumed in the world. It provides abundant secondary metabolites that account for its diverse flavors and health benefits. Here we present the first high-quality chromosome-length reference genome of C. sinensis var. sinensis using long read single-molecule real time (SMRT) sequencing and Hi-C technologies to anchor the ∼2.85-Gb genome assembly into 15 pseudo-chromosomes with a scaffold N50 length of ∼195.68 Mb. We annotated at least 2.17 Gb (∼74.13%) of repetitive sequences and high-confidence prediction of 40,812 protein-coding genes in the ∼2.92-Gb genome assembly. This accurately assembled genome allows us to comprehensively annotate functionally important gene families such as those involved in the biosynthesis of catechins, theanine and caffeine. The contiguous genome assembly provides the first view of the repetitive landscape allowing us to accurately characterize retrotransposon diversity. The large tea tree genome is dominated by a handful of Ty3-gypsy long terminal repeat (LTR) retrotransposon families that recently expanded to high copy numbers. We uncover the latest bursts of numerous non-autonomous LTR retrotransposons that may interfere with the propagation of autonomous retroelements. This reference genome sequence will largely facilitate the improvement of agronomically important traits relevant to the tea quality and production.

Download Full-text

Genome sequence resource of Phomopsis longicolla strain YC2-1, a fungal pathogen causing Phomopsis stem blight in soybean

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-12-20-0340-a ◽

2021 ◽

Author(s):

Xiaolin Zhao ◽

Zhichao Zhang ◽

Sujiao Zheng ◽

Wenwu Ye ◽

Xiaobo Zheng ◽

...

Keyword(s):

Genome Assembly ◽

Stem Canker ◽

Quality Data ◽

Phomopsis Longicolla ◽

Protein Coding ◽

Stem Blight ◽

A Genome ◽

Long Read ◽

Genomic Resource ◽

Blight Disease

Diaporthe-Phomopsis disease complex causes considerable yield losses in soybean production worldwide. As one of the major pathogens, Phomopsis longicolla T. W. Hobbs (syn. Diaporthe longicolla) is not only the primary agent of Phomopsis seed decay, but also one of the agents of Phomopsis pod and stem blight, and Phomopsis stem canker. We performed both PacBio long read sequencing and Illumina short read sequencing, and obtained a genome assembly for the P. longicolla strain YC2-1, which was isolated from soybean stem with Phomopsis stem blight disease. The 63.1 Mb genome assembly contains 87 scaffolds, with a minimum, maximum, and N50 scaffold length of 20 kb, 4.6 Mb, and 1.5 Mb respectively, and a total of 17,407 protein-coding genes. The high-quality data expand the genomic resource of P. longicolla species and will provide a solid foundation for a better understanding of their genetic diversity and pathogenic mechanisms.

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400922 ◽

2020 ◽

Vol 10 (3) ◽

pp. 891-897 ◽

Cited By ~ 3

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

LRSDAY: Long-read Sequencing Data Analysis for Yeasts

10.1101/184572 ◽

2017 ◽

Author(s):

Jia-Xing Yue ◽

Gianni Liti

Keyword(s):

Genome Assembly ◽

Model Organism ◽

Sequencing Data ◽

Protein Coding ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Downstream Analysis ◽

Eukaryotic Organisms ◽

Genomic Regions

AbstractLong-read sequencing technologies have become increasingly popular in genome projects due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast, Saccharomyces cerevisiae, has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here we present LRSDAY, the first one-stop solution to streamline this process. LRSDAY can produce chromosome-level end-to-end genome assembly and comprehensive annotations for various genomic features (including centromeres, protein-coding genes, tRNAs, transposable elements and telomere-associated elements) that are ready for downstream analysis. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable for virtually any eukaryotic organisms. Applying LRSDAY to a S. cerevisiae strain takes ∼43 hrs to generate a complete and well-annotated genome from ∼100X Pacific Biosciences (PacBio) reads using four threads.

Download Full-text

Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass)

F1000Research ◽

10.12688/f1000research.38156.1 ◽

2021 ◽

Vol 10 ◽

pp. 289

Author(s):

Xiao Ma ◽

Jeanine L. Olsen ◽

Thorsten B.H. Reusch ◽

Gabriele Procaccini ◽

Dave Kudrna ◽

...

Keyword(s):

Genome Assembly ◽

Zostera Marina ◽

Draft Genome ◽

High Molecular Weight Dna ◽

Protein Coding ◽

New Findings ◽

Long Read ◽

Sanger Sequence ◽

Assembly Pipeline ◽

Chromosome Level

Background: Seagrasses (Alismatales) are the only fully marine angiosperms. Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7× genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings. Methods: The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A high-quality reference genome was assembled with the MECAT assembly pipeline combining PacBio long-read sequencing and Hi-C scaffolding. Results: In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 protein-encoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: As an important marine angiosperm, the improved Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life.

Download Full-text

Chromosome-level genome assembly of Scapharca kagoshimensis reveals the expanded molecular basis of heme biosynthesis in ark shell

10.22541/au.162155566.67560449/v1 ◽

2021 ◽

Author(s):

Teng Weiming ◽

Xie Xi ◽

Hongtao Nie ◽

Yamin Sun ◽

Liu Xiangfeng ◽

...

Keyword(s):

Genome Assembly ◽

Molecular Basis ◽

Molecular Mechanisms ◽

Heme Biosynthesis ◽

High Quality ◽

Protein Coding ◽

Conserved Genes ◽

Long Time ◽

Muddy Sediments ◽

Chromosome Level

Ark shells are commercially important clam species that inhabit in muddy sediments of shallow coasts in East Asia. For a long time, the lack of genome resources has hindered scientific research of ark shells. Here, we reported a high-quality chromosome-level genome assembly of Scapharca kagoshimensis, with an aim to unravel the molecular basis of heme biosynthesis, and develop genomic resources for genetic breeding and population genetics in ark shells. Nineteen scaffolds corresponding to 19 chromosomes were constructed from 938 contigs (contig N50=2.01 Mb) to produce a final high-quality assembly with a total length of 1.11 Gb and scaffold N50 around 60.64 Mb. The genome assembly represents 93.4% completeness via matching 303 eukaryota core conserved genes. A total of 24,908 protein-coding genes were predicted and 24,551 genes (98.56%) of which were functionally annotated. The enrichment analyses suggested that genes in heme biosynthesis pathways were expanded and positive selection of the hemoglobin genes was also found in the genome of S. kagoshimensis, which gives important insights into the molecular mechanisms and evolution of the heme biosynthesis in mollusca. The valuable genome assembly of S. kagoshimensis would provide a solid foundation for investigating the molecular mechanisms that underlie the diverse biological functions and evolutionary adaptations of S. kagoshimensis.

Download Full-text

The genome of the endangered Macadamia jansenii displays little diversity but represents an important genetic resource for plant breeding

10.1101/2021.09.08.459545 ◽

2021 ◽

Cited By ~ 1

Author(s):

Priyanka Sharma ◽

Valentine Murigneux ◽

Jasmine Haimovitz ◽

Catherine J. Nock ◽

Wei Tian ◽

...

Keyword(s):

Genome Assembly ◽

Morphological Characteristics ◽

Single Copy ◽

Protein Coding ◽

Core Eudicots ◽

Wide Range ◽

Genes Encoding ◽

Long Read ◽

In The Wild

SummaryMacadamia, a recently domesticated expanding nut crop in the tropical and subtropical regions of the world, is one of the most economically important genera in the diverse and widely adapted Proteaceae family. All four species of Macadamia are rare in the wild with the most recently discovered, M. jansenii, being endangered. The M. jansenii genome has been used as a model for testing sequencing methods using a wide range of long read sequencing techniques. Here we report a chromosome level genome assembly, generated using a combination of Pacific Biosciences sequencing and Hi-C, comprising 14 pseudo-molecules, with a N50 of 58 Mb and a total 758 Mb genome assembly size of which 56% is repetitive. Completeness assessment revealed that the assembly covered 96.9% of the conserved single copy genes. Annotation predicted 31,591 protein coding genes and allowed the characterization of genes encoding biosynthesis of cyanogenic glycosides, fatty acid metabolism and anti-microbial proteins. Re-sequencing of seven other genotypes confirmed low diversity and low heterozygosity within this endangered species. Important morphological characteristics of this species such as small tree size and high kernel recovery suggest that M. jansenii is an important source of these commercial traits for breeding. As a member of a small group of families that are sister to the core eudicots, this high-quality genome also provides a key resource for evolutionary and comparative genomics studies.

Download Full-text

Genome Sequence Resource of Phytophthora colocasiae from China Using Nanopore Sequencing Technology

Plant Disease ◽

10.1094/pdis-11-20-2327-a ◽

2021 ◽

Author(s):

Zhixin Wang ◽

Jiandong Bao ◽

Lin Lv ◽

Lianyu Lin ◽

Zhiting Li ◽

...

Keyword(s):

Genome Assembly ◽

Protein Coding ◽

Short Read ◽

Rxlr Effectors ◽

Phytophthora Colocasiae ◽

Oxford Nanopore ◽

Long Read ◽

Infection Mechanisms ◽

Oomycete Pathogen ◽

Taro Leaf Blight

Phytophthora colocasiae is a destructive oomycete pathogen of taro (Colocasia esculenta), which causes taro leaf blight. To date, only one highly fragmented Illumina short-read-based genome assembly is available for this species. To address this problem, we sequenced strain Lyd2019 from China using Oxford Nanopore Technologies (ONT) long-read sequencing and Illumina short-read sequencing. We generated a 92.51-Mb genome assembly consisting of 105 contigs with an N50 of 1.70 Mb and a maximum length of 4.17 Mb. In the genome assembly, we identified 52.78% repeats and 18,322 protein-coding genes, of which 12,782 genes were annotated. We also identified 191 candidate RXLR effectors and 1 candidate CRN effectors. The updated near-chromosome genome assembly and annotation resources will provide a better understanding of the infection mechanisms of P. colocasiae.

Download Full-text