scholarly journals The whale shark genome reveals patterns of vertebrate gene family evolution

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Milton Tan ◽  
Anthony K Redmond ◽  
Helen Dooley ◽  
Ryo Nozu ◽  
Keiichi Sato ◽  
...  

Chondrichthyes (cartilaginous fishes) are fundamental for understanding vertebrate evolution, yet their genomes are understudied. We report long-read sequencing of the whale shark genome to generate the best gapless chondrichthyan genome assembly yet with higher contig contiguity than all other cartilaginous fish genomes, and studied vertebrate genomic evolution of ancestral gene families, immunity, and gigantism. We found a major increase in gene families at the origin of gnathostomes (jawed vertebrates) independent of their genome duplication. We studied vertebrate pathogen recognition receptors (PRRs), which are key in initiating innate immune defense, and found diverse patterns of gene family evolution, demonstrating that adaptive immunity in gnathostomes did not fully displace germline-encoded PRR innovation. We also discovered a new Toll-like receptor (TLR29) and three NOD1 copies in the whale shark. We found chondrichthyan and giant vertebrate genomes had decreased substitution rates compared to other vertebrates, but gene family expansion rates varied among vertebrate giants, suggesting substitution and expansion rates of gene families are decoupled in vertebrate genomes. Finally, we found gene families that shifted in expansion rate in vertebrate giants were enriched for human cancer-related genes, consistent with gigantism requiring adaptations to suppress cancer.

2019 ◽  
Author(s):  
Milton Tan ◽  
Anthony K. Redmond ◽  
Helen Dooley ◽  
Ryo Nozu ◽  
Keiichi Sato ◽  
...  

AbstractDue to their key phylogenetic position, cartilaginous fishes, which includes the largest fish speciesRhincodon typus(whale shark), are an important vertebrate lineage for understanding the origin and evolution of vertebrates. However, until recently, this lineage has been understudied in vertebrate genomics. Using newly-generated long read sequences, we produced the best gapless cartilaginous fish genome assembly to date. The assembly has fewer missing ancestral genes thanCallorhinchus milii, which has been widely-used for evolutionary studies up to now. We used the new assembly to study the evolution of gene families in the whale shark and other vertebrates, focusing on historical patterns of gene family origins and loss across early vertebrate evolution, innate immune receptor repertoire evolution, and dynamics of gene family evolution size in relation to gigantism. From inferring the pattern of origin of gene families across the most recent common ancestors of major vertebrate clades, we found that there were many shared gene families between the whale shark and bony vertebrates that were present in the most recent common ancestor of jawed vertebrates, with a large increase in novel genes at the origin of jawed vertebrates independent of whole genome duplication events. The innate immune system in the whale shark, which consisted of diverse pathogen recognition receptors (PRRs) including NOD-like receptors, RIG-like receptors, and Toll-like receptors. We discovered a unique complement of Toll-like receptors and triplication of NOD1 in the whale shark genome. Further, we found diverse patterns of gene family evolution between PRRs within vertebrates demonstrating that the origin of adaptive immunity in jawed vertebrates is more complicated than simply replacing the need for a vast repertoire of germline encoded PRRs. We then studied rates of amino acid substitution and gene family size evolution across origins of vertebrate gigantism. While we found that cartilaginous fishes and giant vertebrates tended to have slower substitution rates than the background rate in vertebrates, the whale shark genome substitution rate was not significantly slower thanCallorhinchus. Furthermore, rates of gene family size evolution varied among giants and the background, suggesting that differences in rate of substitution and gene family size evolution relative to gigantism are decoupled. We found that the gene families that have shifted in duplication rate in whale shark are enriched for genes related to driving cancer in humans, consistent with studies in other giant vertebrates than support the hypothesis that evolution of increased body size requires adaptations that result in reduction of per cell cancer rate.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Xing Wang ◽  
Yi Zhang ◽  
Yufeng Zhang ◽  
Mingming Kang ◽  
Yuanbo Li ◽  
...  

AbstractEarthworms (Annelida: Crassiclitellata) are widely distributed around the world due to their ancient origination as well as adaptation and invasion after introduction into new habitats over the past few centuries. Herein, we report a 1.2 Gb complete genome assembly of the earthworm Amynthas corticis based on a strategy combining third-generation long-read sequencing and Hi-C mapping. A total of 29,256 protein-coding genes are annotated in this genome. Analysis of resequencing data indicates that this earthworm is a triploid species. Furthermore, gene family evolution analysis shows that comprehensive expansion of gene families in the Amynthas corticis genome has produced more defensive functions compared with other species in Annelida. Quantitative proteomic iTRAQ analysis shows that expression of 147 proteins changed in the body of Amynthas corticis and 16 S rDNA sequencing shows that abundance of 28 microorganisms changed in the gut of Amynthas corticis when the earthworm was incubated with pathogenic Escherichia coli O157:H7. Our genome assembly provides abundant and valuable resources for the earthworm research community, serving as a first step toward uncovering the mysteries of this species, and may provide molecular level indicators of its powerful defensive functions, adaptation to complex environments and invasion ability.


2021 ◽  
Author(s):  
Kim Vertacnik ◽  
Danielle Herrig ◽  
R Keating Godfrey ◽  
Tom Hill ◽  
Scott Geib ◽  
...  

A central goal in evolutionary biology is to determine the predictability of adaptive genetic changes. Despite many documented cases of convergent evolution at individual loci, little is known about the repeatability of gene family expansions and contractions. To address this void, we examined gene family evolution in the redheaded pine sawfly Neodiprion lecontei, a non-eusocial hymenopteran and exemplar of a pine-specialized lineage evolved from angiosperm-feeding ancestors. After assembling and annotating a draft genome, we manually annotated multiple gene families with chemosensory, detoxification, or immunity functions and characterized their genomic distributions and evolutionary history. Our results suggest that expansions of bitter gustatory receptor (GR), clan 3 cytochrome P450 (CYP3), and antimicrobial peptide (AMP) subfamilies may have contributed to pine adaptation. By contrast, there was no evidence of recent gene family contraction via pseudogenization. Next, we compared the number of genes in these same families across insect taxa that vary in diet, dietary specialization, and social behavior. In Hymenoptera, herbivory was associated with large GR and small olfactory receptor (OR) families, eusociality was associated with large OR and small AMP families, and--unlike investigations among more closely related taxa--ecological specialization was not related to gene family size. Overall, our results suggest that gene families that mediate ecological interactions may expand and contract predictably in response to particular selection pressures, however, the ecological drivers and temporal pace of gene gain and loss likely varies considerably across gene families.


2019 ◽  
Author(s):  
Xing Wang ◽  
Yi Zhang ◽  
Yufeng Zhang ◽  
Mingming Kang ◽  
Yuanbo Li ◽  
...  

AbstractEarthworms (Annelida: Crassiclitellata), are widely distributed around the world due to their great adaptability. However, lack of a high-quality genome sequence prevents gaining the many insights into physiology, phylogeny, and genome evolution that could come from a good earthworm genome. Herein, we report a complete genome assembly of the earthworm Amynthas corticis of about 1.2 Gb, based on a strategy combining third-generation long-read sequencing and Hi-C mapping. A total of 29,256 protein-coding genes are annotated in this genome. Analysis of resequencing data indicates that this earthworm is a triploid species. Furthermore, gene family evolution analysis shows that comprehensive expansion of gene families in the earthworm genome has produced more defensive functions compared with other species in Annelida. Quantitative proteomic iTRAQ analysis shows 97 immune related proteins and 16S rDNA sequences shows 88 microbes with significantly response to pathogenic Escherichia coli O157:H7. Our genome assembly provides abundant and valuable resources for the earthworm research community, serving as a first step toward uncovering the mysteries of this species, may explain its powerful defensive functions adapt to complex environment and invasion from molecular level.


2017 ◽  
Author(s):  
Daniel S. Carvalho ◽  
James C. Schnable ◽  
Ana Maria R. Almeida

AbstractThe study of gene family evolution has benefited from the use of phylogenetic tools, which can greatly inform studies of both relationships within gene families and functional divergence. Here, we propose the use of a network-based approach that in combination with phylogenetic methods can provide additional support for models of gene family evolution. We dissect the contributions of each method to the improved understanding of relationships and functions within the well-characterized family of AGAMOUS floral development genes. The results obtained with the two methods largely agreed with one another. In particular, we show how network approaches can provide improved interpretations of branches with low support in a conventional gene tree. The network approach used here may also better reflect known and suspected patterns of functional divergence relative to phylogenetic methods. Overall, we believe that the combined use of phylogenetic and network tools provide more robust assessments of gene family evolution.


2020 ◽  
Vol 12 (3) ◽  
pp. 185-202
Author(s):  
Xia Han ◽  
Jindan Guo ◽  
Erli Pang ◽  
Hongtao Song ◽  
Kui Lin

Abstract How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.


2020 ◽  
Author(s):  
Rui-Ling Zhang ◽  
Qian Zhang ◽  
Zhong Zhang

Abstract Background: The longhorned tick, Haemaphysalis longicornis Neumann, is widely distributed across temperate regions. It can parasitize terrestrial vertebrates, including birds and a large number of mammals. They are a concern in human and animal health notably for their potential to transmit infectious agents. Methods: Genome survey was investigated using GenomeScope v1.0.0 with a maximum k-mer coverage cutoff of 1,000. Non-redundant assembly was polished with Illumina short reads using two rounds of NextPolish v1.1.0. Genome completeness was assessed using BUSCO v3.0.2 pipeline analyses against arthropod gene set (n = 1, 066). Ab initio predictions were generated using BRAKER v2.1.5. Transcriptomic reads were mapped to the genome with HISAT2 v2.2.0 and assembled with StringTie v2.1.2. Gene functions were assigned against UniProtKB database using Diamond v0.9.24. Orthogroups of 16 Chelicerata species were inferred using OrthoFinder v2.3.8 and gene family evolution was estimated using CAFÉ v4.2.1. Gene families related to digestion and detoxification, i.e. cytochrome P450 (CYP), carboxyl/cholinesterase (CCE), glutathione-S-transferase (GST), ATP-binding cassette (ABC) transporter were annotated by searching in the genome assembly. Results: The final genome assembly has a size of 3.12 Gb, a scaffold N50 of 1.09 Mb, and captured 92.4% of the BUSCO gene set (n=1,066). Genome architecture pattern of the longhorned tick resembles another tick, Ixodes scapularis (Say), particularly in large size, highly repetitive DNA (~65%) and protein-coding genes (21,550). We also identified 5,601 non-coding RNAs with a high ratio of tRNAs (4,271). Gene family evolution revealed 350 rapidly evolving gene families. Combining function enrichment analyses of gene ontology (GO) and KEGG pathway, 255 families experiencing significant expansions mainly involves in cuticle synthesis, digestion and detoxification. Conclusions: The new genome assembly, annotation and comparative genomic analyses provide a valuable resource for insights into parasitic life mode of the longhorned tick.


2021 ◽  
Author(s):  
Arthur Zwaenepoel ◽  
Yves Van de Peer

AbstractPhylogenetic models of gene family evolution based on birth-death processes (BDPs) vide an awkward fit to comparative genomic data sets. A central assumption of these models is the constant per-gene loss rate in any particular family. Because of the possibility of partial functional redundancy among gene family members, gene loss dynamics are however likely to be dependent on the number of genes in a family, and different variations of commonly employed BDP models indeed suggest this is the case. We propose a simple two-type branching process model to better approximate the stochastic evolution of gene families by gene duplication and loss and perform Bayesian statistical inference of model parameters in a phylogenetic context. We evaluate the statistical methods using simulated data sets and apply the model to gene family data for Drosophila, yeasts and primates, providing new quantitative insights in the long-term maintenance of duplicated genes.


2018 ◽  
Vol 35 (14) ◽  
pp. 2504-2506 ◽  
Author(s):  
Clément-Marie Train ◽  
Miguel Pignatelli ◽  
Adrian Altenhoff ◽  
Christophe Dessimoz

Abstract Summary The evolutionary history of gene families can be complex due to duplications and losses. This complexity is compounded by the large number of genomes simultaneously considered in contemporary comparative genomic analyses. As provided by several orthology databases, hierarchical orthologous groups (HOGs) are sets of genes that are inferred to have descended from a common ancestral gene within a species clade. This implies that the set of HOGs defined for a particular clade correspond to the ancestral genes found in its last common ancestor. Furthermore, by keeping track of HOG composition along the species tree, it is possible to infer the emergence, duplications and losses of genes within a gene family of interest. However, the lack of tools to manipulate and analyse HOGs has made it difficult to extract, display and interpret this type of information. To address this, we introduce interactive HOG analysis method, an interactive JavaScript widget to visualize and explore gene family history encoded in HOGs and python HOG analysis method, a python library for programmatic processing of genes families. These complementary open source tools greatly ease adoption of HOGs as a scalable and interpretable concept to relate genes across multiple species. Availability and implementation iHam’s code is available at https://github.com/DessimozLab/iHam or can be loaded dynamically. pyHam’s code is available at https://github.com/DessimozLab/pyHam and or via the pip package ‘pyham’.


2019 ◽  
Vol 36 (10) ◽  
pp. 2143-2156 ◽  
Author(s):  
Yiyuan Li ◽  
Hyunjin Park ◽  
Thomas E Smith ◽  
Nancy A Moran

Abstract Genome structural variations, including duplications, deletions, insertions, and inversions, are central in the evolution of eukaryotic genomes. However, structural variations present challenges for high-quality genome assembly, hampering efforts to understand the evolution of gene families and genome architecture. An example is the genome of the pea aphid (Acyrthosiphon pisum) for which the current assembly is composed of thousands of short scaffolds, many of which are known to be misassembled. Here, we present an improved version of the A. pisum genome based on the use of two long-range proximity ligation methods. The new assembly contains four long scaffolds (40–170 Mb), corresponding to the three autosomes and the X chromosome of A. pisum, and encompassing 86% of the new assembly. Assembly accuracy is supported by several quality assessments. Using this assembly, we identify the chromosomal locations and relative ages of duplication events, and the locations of horizontally acquired genes. The improved assembly illuminates the mode of gene family evolution by providing proximity information between paralogs. By estimating nucleotide polymorphism and coverage depth from resequencing data, we determined that many short scaffolds not assembling to chromosomes represent hemizygous regions, which are especially frequent on the highly repetitive X chromosome. Aligning the X-linked aphicarus region, responsible for male wing dimorphism, to the new assembly revealed a 50-kb deletion that cosegregates with the winged male phenotype in some clones. These results show that long-range scaffolding methods can substantially improve assemblies of repetitive genomes and facilitate study of gene family evolution and structural variation.


Sign in / Sign up

Export Citation Format

Share Document