Molecular basis of differential parasitism between non-encapsulated and encapsulated Trichinella revealed by a high-quality genome assembly

AbstractUnderstanding roles of repetitive sequences in genomes of parasites could offer insights into their evolution, speciation, and parasitism. As a unique intracellular nematode, Trichinella consists of two clades, encapsulated and non-encapsulated. Genomic correlation to the distinct differences between the two clades is still unclear. Here we report an annotated draft reference genome of non-encapsulated Trichinella, T. pseudospiralis, and performed comparative analyses with encapsulated T. spiralis. Genome analysis revealed that, during Trichinella evolution, repetitive sequence insertions played an important role in gene family expansion in synergy with DNA methylation, especially for the DNase II members of the phospholipase D superfamily and Glutathione S-transferases. We further identify the genomic and epigenomic regulation of excretory/secretory products in relation to differences in parasitism, pathology and immunology between the two clades Trichinella. The present study provided a foundation for further elucidation of mechanism of nurse cell formation and immunoevasion as well as identification of phamarcological and diagnostic targets of trichinellosis.

Download Full-text

Comparative multi-omics analyses reveal differential expression of key genes relevant for parasitism between non-encapsulated and encapsulated Trichinella

Communications Biology ◽

10.1038/s42003-021-01650-z ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Xiaolei Liu ◽

Yayan Feng ◽

Xue Bai ◽

Xuelin Wang ◽

Rui Qin ◽

...

Keyword(s):

Cell Formation ◽

Invasion Biology ◽

Gene Families ◽

Dnase Ii ◽

Genomic Correlation ◽

Glutathione S Transferases ◽

Differential Expansion ◽

Genome Assemblies ◽

Host Parasite ◽

S Genes

AbstractGenome assemblies provide a powerful basis of comparative multi-omics analyses that offer insight into parasite pathogenicity, host-parasite interactions, and invasion biology. As a unique intracellular nematode, Trichinella consists of two clades, encapsulated and non-encapsulated. Genomic correlation of the distinct differences between the two clades is still unclear. Here, we report an annotated draft reference genome of non-encapsulated Trichinella, T. pseudospiralis, and perform comparative multi-omics analyses with encapsulated T. spiralis. Genome and methylome analyses indicate that, during Trichinella evolution, the two clades of Trichinella exhibit differential expansion and methylation of parasitism-related multi-copy gene families, especially for the DNase II members of the phospholipase D superfamily and Glutathione S-transferases. Further, methylome and transcriptome analyses revealed divergent key excretory/secretory (E/S) genes between the two clades. Among these key E/S genes, TP12446 is significantly more expressed across three life stages in T. pseudospiralis. Overexpression of TP12446 in the mouse C2C12 skeletal muscle cell line could induce inhibition of myotube formation and differentiation, further indicating its key role in parasitism of T. pseudospiralis. This multi-omics study provides a foundation for further elucidation of the mechanism of nurse cell formation and immunoevasion, as well as the identification of pharmacological and diagnostic targets of trichinellosis.

Download Full-text

A High-continuity and Annotated Tomato Reference Genome

10.21203/rs.3.rs-579393/v1 ◽

2021 ◽

Author(s):

Xiao Su ◽

Baoan Wang ◽

Xiaolin Geng ◽

Yuefan Du ◽

Qinqin Yang ◽

...

Keyword(s):

Reference Genome ◽

Agronomic Traits ◽

Repetitive Sequences ◽

Tomato Genome ◽

Ideal Model ◽

Model Species ◽

Horticultural Crop ◽

Protein Coding ◽

Genetic Mechanisms ◽

High Quality Genome

Abstract Background: Genetic and functional genomics studies require a high-quality genome assembly. Tomato (Solanum lycopersicum), an important horticultural crop, is an ideal model species for the study of fruit development. Results: Here, we assembled an updated reference genome of S. lycopersicum cv. Heinz 1706 that was 799.09 Mb in length, containing 34,384 predicted protein-coding genes and 65.66% repetitive sequences. By comparing the genomes of S. lycopersicum and S. pimpinellifolium LA2093, we found a large number of genomic fragments probably associated with human selection, which may have had crucial roles in the domestication of tomato. We also used a recombinant inbred line (RIL) population to generate a high-density genetic map with high resolution and accuracy. Using these resources, we identified a number of candidate genes that were likely to be related to important agronomic traits in tomato. Conclusion:Our results offer opportunities for understanding the evolution of the tomato genome and will facilitate the study of genetic mechanisms in tomato biology.

Download Full-text

A high-continuity and annotated tomato reference genome

BMC Genomics ◽

10.1186/s12864-021-08212-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiao Su ◽

Baoan Wang ◽

Xiaolin Geng ◽

Yuefan Du ◽

Qinqin Yang ◽

...

Keyword(s):

Reference Genome ◽

Agronomic Traits ◽

Repetitive Sequences ◽

Tomato Genome ◽

Ideal Model ◽

Model Species ◽

Horticultural Crop ◽

Protein Coding ◽

Genetic Mechanisms ◽

High Quality Genome

Abstract Background Genetic and functional genomics studies require a high-quality genome assembly. Tomato (Solanum lycopersicum), an important horticultural crop, is an ideal model species for the study of fruit development. Results Here, we assembled an updated reference genome of S. lycopersicum cv. Heinz 1706 that was 799.09 Mb in length, containing 34,384 predicted protein-coding genes and 65.66% repetitive sequences. By comparing the genomes of S. lycopersicum and S. pimpinellifolium LA2093, we found a large number of genomic fragments probably associated with human selection, which may have had crucial roles in the domestication of tomato. We also used a recombinant inbred line (RIL) population to generate a high-density genetic map with high resolution and accuracy. Using these resources, we identified a number of candidate genes that were likely to be related to important agronomic traits in tomato. Conclusion Our results offer opportunities for understanding the evolution of the tomato genome and will facilitate the study of genetic mechanisms in tomato biology.

Download Full-text

Liftoff: accurate mapping of gene annotations

Bioinformatics ◽

10.1093/bioinformatics/btaa1016 ◽

2020 ◽

Author(s):

Alaina Shumate ◽

Steven L Salzberg

Keyword(s):

Reference Genome ◽

Supplementary Information ◽

Closely Related Species ◽

Protein Coding ◽

Human Reference Genome ◽

Sequence Identity ◽

Gene Annotations ◽

Genome Assemblies ◽

Average Sequence Identity ◽

High Quality Genome

Abstract Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and Implementation Liftoff can be installed via bioconda and PyPI. Additionally, the source code for Liftoff is available at https://github.com/agshumate/Liftoff Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Draft genome of the Reindeer (Rangifer tarandus)

10.1101/154724 ◽

2017 ◽

Author(s):

Zhipeng Li ◽

Zeshan Lin ◽

Lei Chen ◽

Hengxing Ba ◽

Yongzhi Yang ◽

...

Keyword(s):

Rangifer Tarandus ◽

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Repetitive Sequences ◽

Draft Genome ◽

Divergence Time ◽

Capra Hircus ◽

High Quality ◽

Illumina Hiseq

AbstractBackgroundReindeer (Rangifer tarandus) is the only fully domesticated species in the Cervidae family, and is the only cervid with a circumpolar distribution. Unlike all other cervids, female reindeer regularly grow cranial appendages (antlers, the defining characteristics of cervids), as well as males. Moreover, reindeer milk contains more protein and less lactose than bovids’ milk. A high quality reference genome of this specie will assist efforts to elucidate these and other important features in the reindeer.FindingsWe obtained 723.2 Gb (Gigabase) of raw reads by an Illumina Hiseq 4000 platform, and a 2.64 Gb final assembly, representing 95.7% of the estimated genome (2.76 Gb according to k-mer analysis), including 92.6% of expected genes according to BUSCO analysis. The contig N50 and scaffold N50 sizes were 89.7 kilo base (kb) and 0.94 mega base (Mb), respectively. We annotated 21,555 protein-coding genes and 1.07 Gb of repetitive sequences by de novo and homology-based prediction. Homology-based searches detected 159 rRNA, 547 miRNA, 1,339 snRNA and 863 tRNA sequences in the genome of R. tarandus. The divergence time between R. tarandus, and ancestors of Bos taurus and Capra hircus, is estimated to be 29.55 million years ago (Mya).ConclusionsOur results provide the first high-quality reference genome for the reindeer, and a valuable resource for studying evolution, domestication and other unusual characteristics of the reindeer.

Download Full-text

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

10.1101/345983 ◽

2018 ◽

Cited By ~ 2

Author(s):

Huilong Du ◽

Chengzhi Liang

Keyword(s):

Single Molecule ◽

High Efficiency ◽

Reference Genome ◽

Repetitive Sequences ◽

Sequencing Data ◽

High Quality ◽

Single Molecule Sequencing ◽

Genome Maps ◽

Long Reads ◽

Novel Method

AbstractDue to the large number of repetitive sequences in complex eukaryotic genomes, fragmented and incompletely assembled genomes lose value as reference sequences, often due to short contigs that cannot be anchored or mispositioned onto chromosomes. Here we report a novel method Highly Efficient Repeat Assembly (HERA), which includes a new concept called a connection graph as well as algorithms for constructing the graph. HERA resolves repeats at high efficiency with single-molecule sequencing data, and enables the assembly of chromosome-scale contigs by further integrating genome maps and Hi-C data. We tested HERA with the genomes of rice R498, maize B73, human HX1 and Tartary buckwheat Pinku1. HERA can correctly assemble most of the tandemly repetitive sequences in rice using single-molecule sequencing data only. Using the same maize and human sequencing data published by Jiao et al. (2017) and Shi et al. (2016), respectively, we dramatically improved on the sequence contiguity compared with the published assemblies, increasing the contig N50 from 1.3 Mb to 61.2 Mb in maize B73 assembly and from 8.3 Mb to 54.4 Mb in human HX1 assembly with HERA. We provided a high-quality maize reference genome with 96.9% of the gaps filled (only 76 gaps left) and several incorrectly positioned sequences fixed compared with the B73 RefGen_v4 assembly. Comparisons between the HERA assembly of HX1 and the human GRCh38 reference genome showed that many gaps in GRCh38 could be filled, and that GRCh38 contained some potential errors that could be fixed. We assembled the Pinku1 genome into 12 scaffolds with a contig N50 size of 27.85 Mb. HERA serves as a new genome assembly/phasing method to generate high quality sequences for complex genomes and as a curation tool to improve the contiguity and completeness of existing reference genomes, including the correction of assembly errors in repetitive regions.

Download Full-text

Chicago and Dovetail Hi-C proximity ligation yield chromosome length scaffolds of Ixodes scapularis genome

10.1101/392126 ◽

2018 ◽

Cited By ~ 3

Author(s):

Andrew B. Nuss ◽

Arvind Sharma ◽

Monika Gulia-Nuss

Keyword(s):

Molecular Level ◽

Ixodes Scapularis ◽

Repetitive Sequences ◽

Chromosome Length ◽

Genome Architecture ◽

High Quality ◽

Proximity Ligation ◽

Sequencing Technologies ◽

Functional Gene Analysis ◽

High Quality Genome

AbstractA high-quality genome sequence is essential for understanding an organism on molecular level. However, the larger genomes with substantial repetitive sequences are challenging to assemble with the sequencing technologies. Hi-C technique is changing the genome architecture landscape by providing links across a variety of length scales, spanning even whole chromosomes. Ixodes scapularis haploid genome is 2.1 gbp and the current assembly consists of 369,495 scaffolds representing 57% of the genome. The fragmented genome poses challenges with functional gene analysis and an improved assembly is needed. We therefore used the Hi C technique to achieve chromosomal level assembly of tick genome. With Chicago and Dovetail Hi C assemblies, we were able to achieve 28 >10Mb sequences that correspond to 28 chromosomes in I. scapularis.

Download Full-text

Chromosomal-level reference genome of Chinese peacock butterfly (Papilio bianor) based on third-generation DNA sequencing and Hi-C analysis

GigaScience ◽

10.1093/gigascience/giz128 ◽

2019 ◽

Vol 8 (11) ◽

Cited By ~ 4

Author(s):

Sihan Lu ◽

Jie Yang ◽

Xuelei Dai ◽

Feiang Xie ◽

Jinwu He ◽

...

Keyword(s):

Reference Genome ◽

De Novo ◽

Demographic History ◽

Repetitive Sequences ◽

Population Expansion ◽

Chromatin Interaction ◽

Interglacial Period ◽

Last Interglacial ◽

High Quality ◽

Chromosome Level

AbstractBackgroundPapilio bianor Cramer, 1777 (commonly known as the Chinese peacock butterfly) (Insecta, Lepidoptera, Papilionidae) is a widely distributed swallowtail butterfly with a wide number of geographic populations ranging from the southeast of Russia to China, Japan, India, Vietnam, Myanmar, and Thailand. Its wing color consists of both pigmentary colored scales (black, reddish) and structural colored scales (iridescent blue or green dust). A high-quality reference genome of P. bianor is an important foundation for investigating iridescent color evolution, phylogeography, and the evolution of swallowtail butterflies.FindingsWe obtained a chromosome-level de novo genome assembly of the highly heterozygous P. bianor using long Pacific Biosciences sequencing reads and high-throughput chromosome conformation capture technology. The final assembly is 421.52 Mb on 30 chromosomes (29 autosomes and 1 Z sex chromosome) with 13.12 Mb scaffold N50. In total, 15,375 protein-coding genes and 233.09 Mb of repetitive sequences were identified. Phylogenetic analyses indicated that P. bianor separated from a common ancestor of swallowtails ∼23.69–36.04 million years ago. Demographic history suggested that the population expansion of this species from the last interglacial period to the last glacial maximum possibly resulted from its decreased natural enemies and its adaptation to climate change during the glacial period.ConclusionsWe present a high-quality chromosome-level reference genome of P. bianor using long-read single-molecule sequencing and Hi-C–based chromatin interaction maps. Our results lay the foundation for exploring the genetic basis of special biological features of P. bianor and also provide a useful data source for comparative genomics and phylogenomics among butterflies and moths.

Download Full-text

The Dark Matter of Large Cereal Genomes: Long Tandem Repeats

International Journal of Molecular Sciences ◽

10.3390/ijms20102483 ◽

2019 ◽

Vol 20 (10) ◽

pp. 2483 ◽

Cited By ~ 5

Author(s):

Veronika Kapustová ◽

Zuzana Tulpová ◽

Helena Toegelová ◽

Petr Novák ◽

Jiří Macas ◽

...

Keyword(s):

Bread Wheat ◽

Tandem Repeats ◽

Reference Genome ◽

Sequence Data ◽

Repetitive Sequences ◽

Short Read Sequence ◽

Cereal Genomes ◽

Genome Assemblies ◽

Reference Genomes ◽

Size Estimates

Reference genomes of important cereals, including barley, emmer wheat and bread wheat, were released recently. Their comparison with genome size estimates obtained by flow cytometry indicated that the assemblies represent not more than 88–98% of the complete genome. This work is aimed at identifying the missing parts in two cereal genomes and proposing techniques to make the assemblies more complete. We focused on tandemly organised repetitive sequences, known to be underrepresented in genome assemblies generated from short-read sequence data. Our study found arrays of three tandem repeats with unit sizes of 1242 to 2726 bp present in the bread wheat reference genome generated from short reads. However, this and another wheat genome assembly employing long PacBio reads failed in integrating correctly the 2726-bp repeat in the pseudomolecule context. This suggests that tandem repeats of this size, frequently incorporated in unassigned scaffolds, may contribute to shrinking of pseudomolecules without reducing size of the entire assembly. We demonstrate how this missing information may be added to the pseudomolecules with the aid of nanopore sequencing of individual BAC clones and optical mapping. Using the latter technique, we identified and localised a 470-kb long array of 45S ribosomal DNA absent from the reference genome of barley.

Download Full-text

A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant

Horticulture Research ◽

10.1038/s41438-020-00391-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Qingzhen Wei ◽

Jinglei Wang ◽

Wuhong Wang ◽

Tianhua Hu ◽

Haijiao Hu ◽

...

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Specific Gene ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Fruit Length ◽

Protein Coding Genes

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.

Download Full-text