Robustness of phylogenetic inference to model misspecification caused by pairwise epistasis

AbstractLikelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pair-wise epistasis and can be used in posterior predictive checks.

Download Full-text

An Examination of the Monophyly of Morning Glory Taxa Using Bayesian Phylogenetic Inference

Systematic Biology ◽

10.1080/10635150290102401 ◽

2002 ◽

Vol 51 (5) ◽

pp. 740-753 ◽

Cited By ~ 51

Author(s):

Richard E. Miller ◽

Thomas R. Buckley ◽

Paul S. Manos

Keyword(s):

Phylogenetic Inference ◽

Morning Glory ◽

Bayesian Phylogenetic Inference

Download Full-text

MrBayes 3: Bayesian phylogenetic inference under mixed models

Bioinformatics ◽

10.1093/bioinformatics/btg180 ◽

2003 ◽

Vol 19 (12) ◽

pp. 1572-1574 ◽

Cited By ~ 18477

Author(s):

F. Ronquist ◽

J. P. Huelsenbeck

Keyword(s):

Mixed Models ◽

Phylogenetic Inference ◽

Bayesian Phylogenetic Inference

Download Full-text

Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method

Molecular Biology and Evolution ◽

10.1093/oxfordjournals.molbev.a025811 ◽

1997 ◽

Vol 14 (7) ◽

pp. 717-724 ◽

Cited By ~ 733

Author(s):

Z. Yang ◽

B. Rannala

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Monte Carlo Method ◽

Dna Sequences ◽

Phylogenetic Inference ◽

Bayesian Phylogenetic Inference

Download Full-text

Sequentially Estimating the Approximate Conditional Mean Using Extreme Learning Machines

Entropy ◽

10.3390/e22111294 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1294

Author(s):

Lijuan Huo ◽

Jin Seo Cho

Keyword(s):

Model Misspecification ◽

Wald Test ◽

Testing Procedure ◽

Model Specification ◽

Test Statistic ◽

Omnibus Test ◽

Gaussian Stochastic Process ◽

Conditional Mean ◽

Polynomial Models ◽

Learning Machine

This study examined the extreme learning machine (ELM) applied to the Wald test statistic for the model specification of the conditional mean, which we call the WELM testing procedure. The omnibus test statistics available in the literature weakly converge to a Gaussian stochastic process under the null that the model is correct, and this makes their application inconvenient. By contrast, the WELM testing procedure is straightforwardly applicable when detecting model misspecification. We applied the WELM testing procedure to the sequential testing procedure formed by a set of polynomial models and estimate an approximate conditional expectation. We then conducted extensive Monte Carlo experiments to evaluate the performance of the sequential WELM testing procedure and verify that it consistently estimates the most parsimonious conditional mean when the set of polynomial models contains a correctly specified model. Otherwise, it consistently rejects all the models in the set.

Download Full-text

Integration of Anatomy Ontologies and Evo-Devo Using Structured Markov Models Suggests a New Framework for Modeling Discrete Phenotypic Traits

Systematic Biology ◽

10.1093/sysbio/syz005 ◽

2019 ◽

Vol 68 (5) ◽

pp. 698-716 ◽

Cited By ~ 19

Author(s):

Sergei Tarasov

Keyword(s):

Regulatory Networks ◽

Markov Models ◽

Phylogenetic Inference ◽

Phenotypic Traits ◽

Character State ◽

Body Parts ◽

Ancestral Character State ◽

State Models ◽

Hidden States ◽

New Framework

Abstract Modeling discrete phenotypic traits for either ancestral character state reconstruction or morphology-based phylogenetic inference suffers from ambiguities of character coding, homology assessment, dependencies, and selection of adequate models. These drawbacks occur because trait evolution is driven by two key processes—hierarchical and hidden—which are not accommodated simultaneously by the available phylogenetic methods. The hierarchical process refers to the dependencies between anatomical body parts, while the hidden process refers to the evolution of gene regulatory networks (GRNs) underlying trait development. Herein, I demonstrate that these processes can be efficiently modeled using structured Markov models (SMM) equipped with hidden states, which resolves the majority of the problems associated with discrete traits. Integration of SMM with anatomy ontologies can adequately incorporate the hierarchical dependencies, while the use of the hidden states accommodates hidden evolution of GRNs and substitution rate heterogeneity. I assess the new models using simulations and theoretical synthesis. The new approach solves the long-standing “tail color problem,” in which the trait is scored for species with tails of different colors or no tails. It also presents a previously unknown issue called the “two-scientist paradox,” in which the nature of coding the trait and the hidden processes driving the trait’s evolution are confounded; failing to account for the hidden process may result in a bias, which can be avoided by using hidden state models. All this provides a clear guideline for coding traits into characters. This article gives practical examples of using the new framework for phylogenetic inference and comparative analysis.

Download Full-text

MrBayes sMC3

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016652461 ◽

2016 ◽

Vol 32 (2) ◽

pp. 246-265 ◽

Cited By ~ 3

Author(s):

Lídia Kuan ◽

Frederico Pratas ◽

Leonel Sousa ◽

Pedro Tomás

Keyword(s):

Dna Sequences ◽

Software Package ◽

State Of The Art ◽

Phylogenetic Inference ◽

Iterative Approach ◽

Computational Power ◽

Data Transfers ◽

Bayesian Phylogenetic Inference ◽

Level Parallelism ◽

Number Of Iterations

MrBayes is a popular software package for Bayesian phylogenetic inference, which uses an iterative approach to derive an evolutionary tree for a collection of species whose DNA sequences are known. Computationally, MrBayes is characterized by a large number of iterations, each composed of a set of tasks that isolated are not very time-consuming, but are globally computationally demanding. To accelerate the latest MrBayes 3.2, this paper presents MrBayes sMC3, which relies on the computational power of an heterogeneous CPU+GPU platform. For this, MrBayes sMC3 exploits both task and data-level parallelism while minimizing the overheads associated with kernel launches and CPU-GPU data transfers. Experimental results indicate that the proposed parallel approach, together with the proposed set of optimizations, allow for an application acceleration of up to 10× regarding the original MrBayes, and up to 3× regarding the Beagle Library. Furthermore, by analyzing the convergence rate of MrBayes sMC3 with that of the state-of-the-art approaches, a significant reduction in execution time is observed.

Download Full-text

A Reversible Jump Method for Bayesian Phylogenetic Inference with a Nonhomogeneous Substitution Model

Molecular Biology and Evolution ◽

10.1093/molbev/msm046 ◽

2007 ◽

Vol 24 (6) ◽

pp. 1286-1299 ◽

Cited By ~ 37

Author(s):

V. Gowri-Shankar ◽

M. Rattray

Keyword(s):

Phylogenetic Inference ◽

Reversible Jump ◽

Substitution Model ◽

Bayesian Phylogenetic Inference

Download Full-text

siMBa—a simple graphical user interface for the Bayesian phylogenetic inference program MrBayes

Mycological Progress ◽

10.1007/s11557-014-1010-2 ◽

2014 ◽

Vol 13 (4) ◽

Cited By ~ 18

Author(s):

Bagdevi Mishra ◽

Marco Thines

Keyword(s):

User Interface ◽

Graphical User Interface ◽

Phylogenetic Inference ◽

Bayesian Phylogenetic Inference

Download Full-text

Stepwise Bayesian Phylogenetic Inference

10.1101/2020.11.11.376459 ◽

2020 ◽

Author(s):

Sebastian Höhna ◽

Allison Y. Hsiang

Keyword(s):

Single Point ◽

Gene Tree ◽

Computational Cost ◽

Phylogenetic Inference ◽

Point Estimate ◽

Sufficient Information ◽

Analysis Pipeline ◽

Stepwise Approach ◽

Joint Approach ◽

Bayesian Phylogenetic Inference

AbstractThe ideal approach to Bayesian phylogenetic inference is to estimate all parameters of interest jointly in a single hierarchical model. However, this is often not feasible in practice due to the high computational cost that would be incurred. Instead, phylogenetic pipelines generally consist of chained analyses, whereby a single point estimate from a given analysis is used as input for the next analysis in the chain (e.g., a single multiple sequence alignment is used to estimate a gene tree). In this framework, uncertainty is not propagated from step to step in the chain, which can lead to inaccurate or spuriously certain results. Here, we formally develop and test the stepwise approach to Bayesian inference, which uses importance sampling to generate observations for the next step of an analysis pipeline from the posterior produced in the previous step. We show that this approach is identical to the joint approach given sufficient information in the data and in the importance sample. This is demonstrated using both a toy example and an analysis pipeline for inferring divergence times using a relaxed clock model. The stepwise approach presented here not only accounts for uncertainty between analysis steps, but also allows for greater flexibility in program choice (and hence model availability) and can be more computationally efficient than the traditional joint approach when multiple models are being tested.

Download Full-text

Two C++ Libraries for Counting Trees on a Phylogenetic Terrace

10.1101/211276 ◽

2017 ◽

Cited By ~ 2

Author(s):

R. Biczok ◽

P. Bozsoky ◽

P. Eisenmann ◽

J. Ernst ◽

T. Ribizel ◽

...

Keyword(s):

Maximum Likelihood ◽

Phylogenetic Tree ◽

Phylogenetic Inference ◽

Source Codes ◽

Likelihood Score ◽

Order Of Magnitude ◽

Tree Topologies ◽

Tree Space ◽

Bayesian Phylogenetic Inference ◽

Counting Trees

AbstractMotivationThe presence of terraces in phylogenetic tree space, that is, a potentially large number of distinct tree topologies that have exactly the same analytical likelihood score, was first described by Sanderson et al, (2011). However, popular software tools for maximum likelihood and Bayesian phylogenetic inference do not yet routinely report, if inferred phylogenies reside on a terrace, or not. We believe, this is due to the unavailability of an efficient library implementation to (i) determine if a tree resides on a terrace, (ii) calculate how many trees reside on a terrace, and (iii) enumerate all trees on a terrace.ResultsIn our bioinformatics programming practical we developed two efficient and independent C++ implementations of the SUPERB algorithm by Constantinescu and Sankoff (1995) for counting and enumerating the trees on a terrace. Both implementations yield exactly the same results and are more than one order of magnitude faster and require one order of magnitude less memory than a previous 3rd party python implementation.AvailabilityThe source codes are available under GNU GPL at https://github.com/[email protected]

Download Full-text