scholarly journals Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Author(s):  
Fredrik Ronquist ◽  
Jan Kudlicka ◽  
Viktor Senderov ◽  
Johannes Borgström ◽  
Nicolas Lartillot ◽  
...  

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Fredrik Ronquist ◽  
Jan Kudlicka ◽  
Viktor Senderov ◽  
Johannes Borgström ◽  
Nicolas Lartillot ◽  
...  

AbstractStatistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.


Author(s):  
Daniel Lundén ◽  
Johannes Borgström ◽  
David Broman

AbstractProbabilistic programming is an approach to reasoning under uncertainty by encoding inference problems as programs. In order to solve these inference problems, probabilistic programming languages (PPLs) employ different inference algorithms, such as sequential Monte Carlo (SMC), Markov chain Monte Carlo (MCMC), or variational methods. Existing research on such algorithms mainly concerns their implementation and efficiency, rather than the correctness of the algorithms themselves when applied in the context of expressive PPLs. To remedy this, we give a correctness proof for SMC methods in the context of an expressive PPL calculus, representative of popular PPLs such as WebPPL, Anglican, and Birch. Previous work have studied correctness of MCMC using an operational semantics, and correctness of SMC and MCMC in a denotational setting without term recursion. However, for SMC inference—one of the most commonly used algorithms in PPLs as of today—no formal correctness proof exists in an operational setting. In particular, an open question is if the resample locations in a probabilistic program affects the correctness of SMC. We solve this fundamental problem, and make four novel contributions: (i) we extend an untyped PPL lambda calculus and operational semantics to include explicit resample terms, expressing synchronization points in SMC inference; (ii) we prove, for the first time, that subject to mild restrictions, any placement of the explicit resample terms is valid for a generic form of SMC inference; (iii) as a result of (ii), our calculus benefits from classic results from the SMC literature: a law of large numbers and an unbiased estimate of the model evidence; and (iv) we formalize the bootstrap particle filter for the calculus and discuss how our results can be further extended to other SMC algorithms.


Author(s):  
Christopher M. Bishop

Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET , which has been widely used in practical applications.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Fredrik Ronquist ◽  
Jan Kudlicka ◽  
Viktor Senderov ◽  
Johannes Borgström ◽  
Nicolas Lartillot ◽  
...  

A Correction to this paper has been published: https://doi.org/10.1038/s42003-021-01922-8


2010 ◽  
Vol 19 (01) ◽  
pp. 65-99 ◽  
Author(s):  
MARC POULY

Computing inference from a given knowledgebase is one of the key competences of computer science. Therefore, numerous formalisms and specialized inference routines have been introduced and implemented for this task. Typical examples are Bayesian networks, constraint systems or different kinds of logic. It is known today that these formalisms can be unified under a common algebraic roof called valuation algebra. Based on this system, generic inference algorithms for the processing of arbitrary valuation algebras can be defined. Researchers benefit from this high level of abstraction to address open problems independently of the underlying formalism. It is therefore all the more astonishing that this theory did not find its way into concrete software projects. Indeed, all modern programming languages for example provide generic sorting procedures, but generic inference algorithms are still mythical creatures. NENOK breaks a new ground and offers an extensive library of generic inference tools based on the valuation algebra framework. All methods are implemented as distributed algorithms that process local and remote knowledgebases in a transparent manner. Besides its main purpose as software library, NENOK also provides a sophisticated graphical user interface to inspect the inference process and the involved graphical structures. This can be used for educational purposes but also as a fast prototyping architecture for inference formalisms.


2021 ◽  
pp. 293-322
Author(s):  
Osvaldo A. Martin ◽  
Ravin Kumar ◽  
Junpeng Lao

2021 ◽  
pp. 67-106
Author(s):  
Osvaldo A. Martin ◽  
Ravin Kumar ◽  
Junpeng Lao

Sign in / Sign up

Export Citation Format

Share Document