scholarly journals GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments

Author(s):  
Stephen M Crotty ◽  
Bui Quang Minh ◽  
Nigel G Bean ◽  
Barbara R Holland ◽  
Jonathan Tuke ◽  
...  

Abstract Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.

2017 ◽  
Author(s):  
Stephen M Crotty ◽  
Bui Quang Minh ◽  
Nigel G Bean ◽  
Barbara R Holland ◽  
Jonathan Tuke ◽  
...  

AbstractMolecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths and substitution model parameters from heterotachously-evolved sequences. We develop a model selection algorithm based on simulation results, and investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic dataset composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a dataset composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to infer a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model is able to offer unique biological insights when applied to empirical data.


2020 ◽  
Vol 20 (4) ◽  
pp. 410-436
Author(s):  
Sarah E Heaps ◽  
Tom MW Nye ◽  
Richard J Boys ◽  
Tom A Williams ◽  
Svetlana Cherlin ◽  
...  

Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees relating species. Along branches, sequence evolution is modelled using a continuous-time Markov process characterized by an instantaneous rate matrix. Early models assumed the same rate matrix governed substitutions at all sites of the alignment, ignoring variation in evolutionary pressures. Substantial improvements in phylogenetic inference and model fit were achieved by augmenting these models with multiplicative random effects that describe the result of variation in selective constraints and allow sites to evolve at different rates which linearly scale a baseline rate matrix. Motivated by this pioneering work, we consider an extension using a quadratic, rather than linear, transformation. The resulting models allow for variation in the selective coefficients of different types of point mutation at a site in addition to variation in selective constraints. We derive properties of the extended models. For certain non-stationary processes, the extension gives a model that allows variation in sequence composition, both across sites and taxa. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. Our quadratic models are applied to alignments spanning the tree of life and compared with site-homogeneous and linear models.


2020 ◽  
Author(s):  
Thomas KF Wong ◽  
Subha Kalyaanamoorthy ◽  
Karen Meusemann ◽  
David K Yeates ◽  
Bernhard Misof ◽  
...  

ABSTRACTMultiple sequence alignments (MSAs) play a pivotal role in studies of molecular sequence data, but nobody has developed a minimum reporting standard (MRS) to quantify the completeness of MSAs in terms of completely-specified nucleotides or amino acids. We present an MRS that relies on four simple completeness metrics. The metrics are implemented in AliStat, a program developed to support the MRS. A survey of published MSAs illustrates the benefits and unprecedented transparency offered by the MRS.


2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Thomas K F Wong ◽  
Subha Kalyaanamoorthy ◽  
Karen Meusemann ◽  
David K Yeates ◽  
Bernhard Misof ◽  
...  

Abstract Multiple sequence alignments (MSAs) play a pivotal role in studies of molecular sequence data, but nobody has developed a minimum reporting standard (MRS) to quantify the completeness of MSAs in terms of completely specified nucleotides or amino acids. We present an MRS that relies on four simple completeness metrics. The metrics are implemented in AliStat, a program developed to support the MRS. A survey of published MSAs illustrates the benefits and unprecedented transparency offered by the MRS.


2019 ◽  
Vol 37 (4) ◽  
pp. 1202-1210 ◽  
Author(s):  
David A Duchêne ◽  
K Jun Tong ◽  
Charles S P Foster ◽  
Sebastián Duchêne ◽  
Robert Lanfear ◽  
...  

Abstract Evolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.


2018 ◽  
Author(s):  
Huw A. Ogilvie ◽  
Timothy G. Vaughan ◽  
Nicholas J. Matzke ◽  
Graham J. Slater ◽  
Tanja Stadler ◽  
...  

AbstractBayesian methods can be used to accurately estimate species tree topologies, times and other parameters, but only when the models of evolution which are available and utilized sufficiently account for the underlying evolutionary processes. Multispecies coalescent (MSC) models have been shown to accurately account for the evolution of genes within species in the absence of strong gene flow between lineages, and fossilized birth-death (FBD) models have been shown to estimate divergence times from fossil data in good agreement with expert opinion. Until now dating analyses using the MSC have been based on a fixed clock or informally derived node priors instead of the FBD. On the other hand, dating analyses using an FBD process have concatenated all gene sequences and ignored coalescence processes. To address these mirror-image deficiencies in evolutionary models, we have developed an integrative model of evolution which combines both the FBD and MSC models. By applying concatenation and the MSC (without employing the FBD process) to an exemplar data set consisting of molecular sequence data and morphological characters from the dog and fox subfamily Caninae, we show that concatenation causes predictable biases in estimated branch lengths. We then applied concatenation using the FBD process and the combined FBD-MSC model to show that the same biases are still observed when the FBD process is employed. These biases can be avoided by using the FBD-MSC model, which coherently models fossilization and gene evolution, and does not require an a priori substitution rate estimate to calibrate the molecular clock. We have implemented the FBD-MSC in a new version of StarBEAST2, a package developed for the BEAST2 phylogenetic software.


2018 ◽  
Author(s):  
Qiqing Tao ◽  
Koichiro Tamura ◽  
Fabia Battistuzzi ◽  
Sudhir Kumar

AbstractNew species arise from pre-existing species and inherit similar genomes and environments. This predicts greater similarity of mutation rates and the tempo of molecular evolution between direct ancestors and descendants, resulting in autocorrelation of evolutionary rates within lineages in the tree of life. Surprisingly, molecular sequence data have not confirmed this expectation, possibly because available methods lack power to detect autocorrelated rates. Here we present a machine learning method to detect the presence evolutionary rate autocorrelation in large phylogenies. The new method is computationally efficient and performs better than the available state-of-the-art methods. Application of the new method reveals extensive rate autocorrelation in DNA and amino acid sequence evolution of mammals, birds, insects, metazoans, plants, fungi, and prokaryotes. Therefore, rate autocorrelation is a common phenomenon throughout the tree of life. These findings suggest concordance between molecular and non-molecular evolutionary patterns and will foster unbiased and precise dating of the tree of life.


Zootaxa ◽  
2010 ◽  
Vol 2665 (1) ◽  
pp. 51 ◽  
Author(s):  
ELENA K. KUPRIYANOVA ◽  
EIJIROH NISHI

A collection of Serpulidae (Annelida, Polychaeta) from the Patton-Murray Seamounts, Gulf of Alaska, USA contained three species Apomatus voightae n. sp., Bathyvermilia eliasoni n. comb., and Hyalopomatus biformis (Hartman, 1960). Apomatus voightae n. sp. differed from all other Apomatus spp. and from all known serpulid species by very unusual flat and ribbon-like branchial radioles as well by details of chaetal structure. Vermiliopsis eliasoni Zibrowius (1970) previously known from Atlantic and Mediterranean, was transferred to the genus Bathyvermilia Zibrowius, 1973. Hyalopomatus biformis is a deep-sea species distributed in the north-eastern Pacific from Alaska to California, USA. All serpulids were described in detail and their chaetal structure elucidated with the help of scanning electron microscopy. Molecular sequence data (18S rDNA) were aligned to a recently published serpulid data set and maximum parsimony analysis was performed to examine the phylogenetic position of the species and confirm their identification. Hyalopomatus biformis formed a sister group with Laminatubus alvini, Apomatus voightae n. sp. formed a sister group with Apomatus globifer, and Bathyvermilia eliasoni formed a weakly supported polytomy with Chitinopoma serrula, Protula tubularia and Apomatus spp. We briefly discussed biogeographic affinities of the serpulids from the PattonMurray Seamounts in the light of seamount ecology and biogeography.


2019 ◽  
Vol 36 (4) ◽  
pp. 811-824 ◽  
Author(s):  
Qiqing Tao ◽  
Koichiro Tamura ◽  
Fabia U. Battistuzzi ◽  
Sudhir Kumar

Abstract New species arise from pre-existing species and inherit similar genomes and environments. This predicts greater similarity of the tempo of molecular evolution between direct ancestors and descendants, resulting in autocorrelation of evolutionary rates in the tree of life. Surprisingly, molecular sequence data have not confirmed this expectation, possibly because available methods lack the power to detect autocorrelated rates. Here, we present a machine learning method, CorrTest, to detect the presence of rate autocorrelation in large phylogenies. CorrTest is computationally efficient and performs better than the available state-of-the-art method. Application of CorrTest reveals extensive rate autocorrelation in DNA and amino acid sequence evolution of mammals, birds, insects, metazoans, plants, fungi, parasitic protozoans, and prokaryotes. Therefore, rate autocorrelation is a common phenomenon throughout the tree of life. These findings suggest concordance between molecular and nonmolecular evolutionary patterns, and they will foster unbiased and precise dating of the tree of life.


2018 ◽  
Vol 50 (3) ◽  
pp. 299-312 ◽  
Author(s):  
Steven D. LEAVITT ◽  
Paul M. KIRIKA ◽  
Guillermo AMO DE PAZ ◽  
Jen-Pan HUANG ◽  
Jae-Seoun HUR ◽  
...  

AbstractSpecies richness is not evenly distributed across the tree of life and a limited number of lineages comprise an extraordinarily large number of species. In lichen-forming fungi, only two genera are known to be ‘ultradiverse’ (>500 species), with the most diverse genus, Xanthoparmelia, consisting of c. 820 species. While Australia and South Africa are known as current centres of diversity for Xanthoparmelia, it is not well known when and where this massive diversity arose. To better understand the geographical and temporal context of diversification in this diverse genus, we sampled 191 Xanthoparmelia specimens representing c. 124 species/species-level lineages from populations worldwide. From these specimens, we generated a multi-locus sequence data set using Sanger and high-throughput sequencing to reconstruct evolutionary relationships in Xanthoparmelia, estimate divergence times and reconstruct biogeographical histories in a maximum likelihood and Bayesian framework. This study corroborated the phylogenetic placement of several morphologically or chemically diverse taxa within Xanthoparmelia, such as Almbornia, Chondropsis, Karoowia, Namakwa, Neofuscelia, Omphalodiella, Paraparmelia, Placoparmelia and Xanthomaculina, in addition to improved phylogenetic resolution and reconstruction of previously unsampled lineages within Xanthoparmelia. Our data indicate that Xanthoparmelia most likely originated in Africa during the early Miocene, coinciding with global aridification and development of open habitats. Reconstructed biogeographical histories of Xanthoparmelia reveal diversification restricted to continents with infrequent intercontinental exchange by long-distance dispersal. While likely mechanisms by which Xanthoparmelia obtained strikingly high levels of species richness in Australia and South Africa remain uncertain, this study provides a framework for ongoing research into diverse lineages of lichen-forming fungi. Finally, our study highlights a novel approach for generating locus-specific molecular sequence data sets from high throughput metagenomic reads.


Sign in / Sign up

Export Citation Format

Share Document