scholarly journals Machine learning methods are useful for Approximate Bayesian Computation in evolution and ecology

Author(s):  
Michael Blum
2020 ◽  
Author(s):  
Marcelo Gehara ◽  
Guilherme G. Mazzochinni ◽  
Frank Burbrink

AbstractUnderstanding population divergence involves testing diversification scenarios and estimating historical parameters, such as divergence time, population size and migration rate. There is, however, an immense space of possible highly parameterized scenarios that are difsficult or impossible to solve analytically. To overcome this problem researchers have used alternative simulation-based approaches, such as approximate Bayesian computation (ABC) and supervised machine learning (SML), to approximate posterior probabilities of hypotheses. In this study we demonstrate the utility of our newly developed R-package to simulate summary statistics to perform ABC and SML inferences. We compare the power of both ABC and SML methods and the influence of the number of loci in the accuracy of inferences; and we show three empirical examples: (i) the Muller’s termite frog genomic data from Southamerica; (ii) the cottonmouth and (iii) and the copperhead snakes sanger data from Northamerica. We found that SML is more efficient than ABC. It is generally more accurate and needs fewer simulations to perform an inference. We found support for a divergence model without migration, with a recent bottleneck for one of the populations of the southamerican frog. For the cottonmouth we found support for divergence with migration and recent expansion and for the copperhead we found support for a model of divergence with migration and recent bottleneck. Interestingly, by using an SML method it was possible to achieve high accuracy in model selection even when several models were compared in a single inference. We also found a higher accuracy when inferring parameters with SML.


2016 ◽  
Author(s):  
Emma Saulnier ◽  
Olivier Gascuel ◽  
Samuel Alizon

AbstractPhylodynamics typically rely on likelihood-based methods to infer epidemiological parameters from dated phylogenies. These methods are essentially based on simple epidemiological models because of the difficulty in expressing the likelihood function analytically. Computing this function numerically raises additional challenges, especially for large phylogenies. Here, we use Approximate Bayesian Computation (ABC) to circumvent these problems. ABC is a likelihood-free method of parameter inference, based on simulation and comparison between target data and simulated data, using summary statistics. We simulated target trees under several epidemiological scenarios in order to assess the accuracy of ABC methods for inferring epidemiological parameter such as the basic reproduction number (R0), the mean duration of infection, and the effective host population size. We designed many summary statistics to capture the information in a phylogeny and its corresponding lineage-through-time plot. We then used the simplest ABC method, called rejection, and its modern derivative complemented with adjustment of the posterior distribution by regression. The availability of machine learning techniques including variable selection, motivated us to compute many summary statistics on the phylogeny. We found that ABC-based inference reaches an accuracy comparable to that of likelihood-based methods for birth-death models and can even outperform existing methods for more refined models and large trees. By re-analysing data from the early stages of the recent Ebola epidemic in Sierra Leone, we also found that ABC provides more realistic estimates than the likelihood-based methods, for some parameters. This work shows that the combination of ABC-based inference using many summary statistics and sophisticated machine learning methods able to perform variable selection is a promising approach to analyse large phylogenies and non-trivial models.


Sign in / Sign up

Export Citation Format

Share Document