scholarly journals A two-type branching process model of gene family evolution

2021 ◽  
Author(s):  
Arthur Zwaenepoel ◽  
Yves Van de Peer

AbstractPhylogenetic models of gene family evolution based on birth-death processes (BDPs) vide an awkward fit to comparative genomic data sets. A central assumption of these models is the constant per-gene loss rate in any particular family. Because of the possibility of partial functional redundancy among gene family members, gene loss dynamics are however likely to be dependent on the number of genes in a family, and different variations of commonly employed BDP models indeed suggest this is the case. We propose a simple two-type branching process model to better approximate the stochastic evolution of gene families by gene duplication and loss and perform Bayesian statistical inference of model parameters in a phylogenetic context. We evaluate the statistical methods using simulated data sets and apply the model to gene family data for Drosophila, yeasts and primates, providing new quantitative insights in the long-term maintenance of duplicated genes.

2020 ◽  
Author(s):  
Rui-Ling Zhang ◽  
Qian Zhang ◽  
Zhong Zhang

Abstract Background: The longhorned tick, Haemaphysalis longicornis Neumann, is widely distributed across temperate regions. It can parasitize terrestrial vertebrates, including birds and a large number of mammals. They are a concern in human and animal health notably for their potential to transmit infectious agents. Methods: Genome survey was investigated using GenomeScope v1.0.0 with a maximum k-mer coverage cutoff of 1,000. Non-redundant assembly was polished with Illumina short reads using two rounds of NextPolish v1.1.0. Genome completeness was assessed using BUSCO v3.0.2 pipeline analyses against arthropod gene set (n = 1, 066). Ab initio predictions were generated using BRAKER v2.1.5. Transcriptomic reads were mapped to the genome with HISAT2 v2.2.0 and assembled with StringTie v2.1.2. Gene functions were assigned against UniProtKB database using Diamond v0.9.24. Orthogroups of 16 Chelicerata species were inferred using OrthoFinder v2.3.8 and gene family evolution was estimated using CAFÉ v4.2.1. Gene families related to digestion and detoxification, i.e. cytochrome P450 (CYP), carboxyl/cholinesterase (CCE), glutathione-S-transferase (GST), ATP-binding cassette (ABC) transporter were annotated by searching in the genome assembly. Results: The final genome assembly has a size of 3.12 Gb, a scaffold N50 of 1.09 Mb, and captured 92.4% of the BUSCO gene set (n=1,066). Genome architecture pattern of the longhorned tick resembles another tick, Ixodes scapularis (Say), particularly in large size, highly repetitive DNA (~65%) and protein-coding genes (21,550). We also identified 5,601 non-coding RNAs with a high ratio of tRNAs (4,271). Gene family evolution revealed 350 rapidly evolving gene families. Combining function enrichment analyses of gene ontology (GO) and KEGG pathway, 255 families experiencing significant expansions mainly involves in cuticle synthesis, digestion and detoxification. Conclusions: The new genome assembly, annotation and comparative genomic analyses provide a valuable resource for insights into parasitic life mode of the longhorned tick.


2021 ◽  
Author(s):  
Kim Vertacnik ◽  
Danielle Herrig ◽  
R Keating Godfrey ◽  
Tom Hill ◽  
Scott Geib ◽  
...  

A central goal in evolutionary biology is to determine the predictability of adaptive genetic changes. Despite many documented cases of convergent evolution at individual loci, little is known about the repeatability of gene family expansions and contractions. To address this void, we examined gene family evolution in the redheaded pine sawfly Neodiprion lecontei, a non-eusocial hymenopteran and exemplar of a pine-specialized lineage evolved from angiosperm-feeding ancestors. After assembling and annotating a draft genome, we manually annotated multiple gene families with chemosensory, detoxification, or immunity functions and characterized their genomic distributions and evolutionary history. Our results suggest that expansions of bitter gustatory receptor (GR), clan 3 cytochrome P450 (CYP3), and antimicrobial peptide (AMP) subfamilies may have contributed to pine adaptation. By contrast, there was no evidence of recent gene family contraction via pseudogenization. Next, we compared the number of genes in these same families across insect taxa that vary in diet, dietary specialization, and social behavior. In Hymenoptera, herbivory was associated with large GR and small olfactory receptor (OR) families, eusociality was associated with large OR and small AMP families, and--unlike investigations among more closely related taxa--ecological specialization was not related to gene family size. Overall, our results suggest that gene families that mediate ecological interactions may expand and contract predictably in response to particular selection pressures, however, the ecological drivers and temporal pace of gene gain and loss likely varies considerably across gene families.


2016 ◽  
Author(s):  
Kassian Kobert ◽  
Alexandros Stamatakis ◽  
Tomáš Flouri

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.


2015 ◽  
Vol 11 (A29A) ◽  
pp. 205-207
Author(s):  
Philip C. Gregory

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.


2017 ◽  
Author(s):  
Daniel S. Carvalho ◽  
James C. Schnable ◽  
Ana Maria R. Almeida

AbstractThe study of gene family evolution has benefited from the use of phylogenetic tools, which can greatly inform studies of both relationships within gene families and functional divergence. Here, we propose the use of a network-based approach that in combination with phylogenetic methods can provide additional support for models of gene family evolution. We dissect the contributions of each method to the improved understanding of relationships and functions within the well-characterized family of AGAMOUS floral development genes. The results obtained with the two methods largely agreed with one another. In particular, we show how network approaches can provide improved interpretations of branches with low support in a conventional gene tree. The network approach used here may also better reflect known and suspected patterns of functional divergence relative to phylogenetic methods. Overall, we believe that the combined use of phylogenetic and network tools provide more robust assessments of gene family evolution.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Milton Tan ◽  
Anthony K Redmond ◽  
Helen Dooley ◽  
Ryo Nozu ◽  
Keiichi Sato ◽  
...  

Chondrichthyes (cartilaginous fishes) are fundamental for understanding vertebrate evolution, yet their genomes are understudied. We report long-read sequencing of the whale shark genome to generate the best gapless chondrichthyan genome assembly yet with higher contig contiguity than all other cartilaginous fish genomes, and studied vertebrate genomic evolution of ancestral gene families, immunity, and gigantism. We found a major increase in gene families at the origin of gnathostomes (jawed vertebrates) independent of their genome duplication. We studied vertebrate pathogen recognition receptors (PRRs), which are key in initiating innate immune defense, and found diverse patterns of gene family evolution, demonstrating that adaptive immunity in gnathostomes did not fully displace germline-encoded PRR innovation. We also discovered a new Toll-like receptor (TLR29) and three NOD1 copies in the whale shark. We found chondrichthyan and giant vertebrate genomes had decreased substitution rates compared to other vertebrates, but gene family expansion rates varied among vertebrate giants, suggesting substitution and expansion rates of gene families are decoupled in vertebrate genomes. Finally, we found gene families that shifted in expansion rate in vertebrate giants were enriched for human cancer-related genes, consistent with gigantism requiring adaptations to suppress cancer.


Author(s):  
Diana Moreno Santillan ◽  
Tanya Lama ◽  
Yocelyn Gutiérrez Guerrero ◽  
Alexis Brown ◽  
Paul Donat ◽  
...  

Comprising more than 1400 species, bats possess adaptations unique among mammals including powered flight, unexpected longevity given small body size, and extraordinary immunity. Some of the molecular mechanisms underlying these unique adaptations includes DNA repair, metabolism and immunity. However, analyses have been limited to a few divergent lineages, reducing the scope of inferences on gene family evolution across the Order Chiroptera. We conducted an exhaustive comparative genomic study of 37 bat species encompassing a large number of lineages, with a particular emphasis on multi-gene family evolution across immune system and metabolic genes. In agreement with previous analyses, we found lineage-specific expansions of the APOBEC3 and MHC-I gene families, and loss of the proinflammatory PYHIN gene family. We inferred more than 1,000 gene losses unique to bats, including genes involved in the regulation of inflammasome pathways such as epithelial defense receptors, the natural killer gene complex and the interferon-gamma induced pathway. Gene set enrichment analyses revealed genes lost in bats are involved in defense response against pathogen-associated molecular patterns and damage-associated molecular patterns. Gene family evolution and selection analyses indicate bats have evolved fundamental functional differences compared to other mammals in both innate and adaptive immune system, with the potential to enhance anti-viral immune response while dampening inflammatory signaling. In addition, metabolic genes have experienced repeated expansions related to convergent shifts to plant-based diets. Our analyses support the hypothesis that, in tandem with flight, ancestral bats had evolved a unique set of immune adaptations whose functional implications remain to be explored.


2021 ◽  
Vol 2021 ◽  
pp. 1-27
Author(s):  
Awad A. Bakery ◽  
Wael Zakaria ◽  
OM Kalthum S. K. Mohamed

The generalized Gamma model has been applied in a variety of research fields, including reliability engineering and lifetime analysis. Indeed, we know that, from the above, it is unbounded. Data have a bounded service area in a variety of applications. A new five-parameter bounded generalized Gamma model, the bounded Weibull model with four parameters, the bounded Gamma model with four parameters, the bounded generalized Gaussian model with three parameters, the bounded exponential model with three parameters, and the bounded Rayleigh model with two parameters, is presented in this paper as a special case. This approach to the problem, which utilizes a bounded support area, allows for a great deal of versatility in fitting various shapes of observed data. Numerous properties of the proposed distribution have been deduced, including explicit expressions for the moments, quantiles, mode, moment generating function, mean variance, mean residual lifespan, and entropies, skewness, kurtosis, hazard function, survival function, r   th order statistic, and median distributions. The delivery has hazard frequencies that are monotonically increasing or declining, bathtub-shaped, or upside-down bathtub-shaped. We use the Newton Raphson approach to approximate model parameters that increase the log-likelihood function and some of the parameters have a closed iterative structure. Six actual data sets and six simulated data sets were tested to demonstrate how the proposed model works in reality. We illustrate why the Model is more stable and less affected by sample size. Additionally, the suggested model for wavelet histogram fitting of images and sounds is very accurate.


2020 ◽  
Vol 69 (5) ◽  
pp. 973-986 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Timothy G Vaughan ◽  
Tanja Stadler

Abstract Heterogeneous populations can lead to important differences in birth and death rates across a phylogeny. Taking this heterogeneity into account is necessary to obtain accurate estimates of the underlying population dynamics. We present a new multitype birth–death model (MTBD) that can estimate lineage-specific birth and death rates. This corresponds to estimating lineage-dependent speciation and extinction rates for species phylogenies, and lineage-dependent transmission and recovery rates for pathogen transmission trees. In contrast with previous models, we do not presume to know the trait driving the rate differences, nor do we prohibit the same rates from appearing in different parts of the phylogeny. Using simulated data sets, we show that the MTBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a reanalysis of two empirical data sets and compare the results obtained by MTBD and by the existing software BAMM. We compare two implementations of the model, one exact and one approximate (assuming that no rate changes occur in the extinct parts of the tree), and show that the approximation only slightly affects results. The MTBD model is implemented as a package in the Bayesian inference software BEAST 2 and allows joint inference of the phylogeny and the model parameters.[Birth–death; lineage specific rates, multi-type model.]


2020 ◽  
Vol 12 (3) ◽  
pp. 185-202
Author(s):  
Xia Han ◽  
Jindan Guo ◽  
Erli Pang ◽  
Hongtao Song ◽  
Kui Lin

Abstract How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.


Sign in / Sign up

Export Citation Format

Share Document