scholarly journals Convolutional Neural Networks as Summary Statistics for Approximate Bayesian Computation

Author(s):  
Mattias Akesson ◽  
Prashant Singh ◽  
Fredrik Wrede ◽  
Andreas Hellander
2020 ◽  
Author(s):  
Manolo F. Perez ◽  
Isabel A. S. Bonatelli ◽  
Monique Romeiro-Brito ◽  
Fernando F. Franco ◽  
Nigel P. Taylor ◽  
...  

AbstractDelimiting species boundaries is a major goal in evolutionary biology. An increasing body of literature has focused on the challenges of investigating cryptic diversity within complex evolutionary scenarios of speciation, including gene flow and demographic fluctuations. New methods based on model selection, such as approximate Bayesian computation, approximate likelihood, and machine learning approaches, are promising tools arising in this field. Here, we introduce a framework for species delimitation using the multispecies coalescent model coupled with a deep learning algorithm based on convolutional neural networks (CNNs). We compared this strategy with a similar ABC approach. We applied both methods to test species boundary hypotheses based on current and previous taxonomic delimitations as well as genetic data (sequences from 41 loci) in Pilosocereus aurisetus, a cactus species with a sky-island distribution and taxonomic uncertainty. To validate our proposed method, we also applied the same strategy on sequence data from widely accepted species from the genus Drosophila. The results show that our CNN approach has high capacity to distinguish among the simulated species delimitation scenarios, with higher accuracy than the ABC procedure. For Pilosocereus, the delimitation hypothesis based on a splitter taxonomic arrangement without migration showed the highest probability in both CNN and ABC approaches. The splits observed within P. aurisetus agree with previous taxonomic conjectures considering more taxonomic entities within currently accepted species. Our results highlight the cryptic diversity within P. aurisetus and show that CNNs are a promising approach for distinguishing divergent and complex evolutionary histories, even outperforming the accuracy of other model-based approaches such as ABC. Keywords: Species delimitation, fragmented systems, recent diversification, deep learning, Convolutional Neural Networks, Approximate Bayesian Computation


Author(s):  
Hsuan Jung ◽  
Paul Marjoram

In this paper, we develop a Genetic Algorithm that can address the fundamental problem of how one should weight the summary statistics included in an approximate Bayesian computation analysis built around an accept/reject algorithm, and how one might choose the tolerance for that analysis. We then demonstrate that using weighted statistics, and a well-chosen tolerance, in such an approximate Bayesian computation approach can result in improved performance, when compared to unweighted analyses, using one example drawn purely from statistics and two drawn from the estimation of population genetics parameters.


2016 ◽  
Vol 43 (12) ◽  
pp. 2191-2202 ◽  
Author(s):  
Muhammad Faisal ◽  
Andreas Futschik ◽  
Ijaz Hussain ◽  
Mitwali Abd-el.Moemen

Author(s):  
Théophile Sanchez ◽  
Jean Cury ◽  
Guillaume Charpiat ◽  
Flora Jay

AbstractFor the past decades, simulation-based likelihood-free inference methods have enabled researchers to address numerous population genetics problems. As the richness and amount of simulated and real genetic data keep increasing, the field has a strong opportunity to tackle tasks that current methods hardly solve. However, high data dimensionality forces most methods to summarize large genomic datasets into a relatively small number of handcrafted features (summary statistics). Here we propose an alternative to summary statistics, based on the automatic extraction of relevant information using deep learning techniques. Specifically, we design artificial neural networks (ANNs) that take as input single nucleotide polymorphic sites (SNPs) found in individuals sampled from a single population and infer the past effective population size history. First, we provide guidelines to construct artificial neural networks that comply with the intrinsic properties of SNP data such as invariance to permutation of haplotypes, long scale interactions between SNPs and variable genomic length. Thanks to a Bayesian hyperparameter optimization procedure, we evaluate the performance of multiple networks and compare them to well established methods like Approximate Bayesian Computation (ABC). Even without the expert knowledge of summary statistics, our approach compares fairly well to an ABC based on handcrafted features. Furthermore we show that combining deep learning and ABC can improve performance while taking advantage of both frameworks. Finally, we apply our approach to reconstruct the effective population size history of cattle breed populations.


Biometrika ◽  
2020 ◽  
Author(s):  
Grégoire Clarté ◽  
Christian P Robert ◽  
Robin J Ryder ◽  
Julien Stoehr

Abstract Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the Approximate Bayesian computation approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.


Sign in / Sign up

Export Citation Format

Share Document