Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages

2021 ◽

pp. 404-431

Author(s):

Daniel Lundén ◽

Johannes Borgström ◽

David Broman

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Sequential Monte Carlo ◽

Fundamental Problem ◽

Operational Semantics ◽

Correctness Proof ◽

Probabilistic Programming ◽

Inference Problems ◽

Large Numbers ◽

Open Question

AbstractProbabilistic programming is an approach to reasoning under uncertainty by encoding inference problems as programs. In order to solve these inference problems, probabilistic programming languages (PPLs) employ different inference algorithms, such as sequential Monte Carlo (SMC), Markov chain Monte Carlo (MCMC), or variational methods. Existing research on such algorithms mainly concerns their implementation and efficiency, rather than the correctness of the algorithms themselves when applied in the context of expressive PPLs. To remedy this, we give a correctness proof for SMC methods in the context of an expressive PPL calculus, representative of popular PPLs such as WebPPL, Anglican, and Birch. Previous work have studied correctness of MCMC using an operational semantics, and correctness of SMC and MCMC in a denotational setting without term recursion. However, for SMC inference—one of the most commonly used algorithms in PPLs as of today—no formal correctness proof exists in an operational setting. In particular, an open question is if the resample locations in a probabilistic program affects the correctness of SMC. We solve this fundamental problem, and make four novel contributions: (i) we extend an untyped PPL lambda calculus and operational semantics to include explicit resample terms, expressing synchronization points in SMC inference; (ii) we prove, for the first time, that subject to mild restrictions, any placement of the explicit resample terms is valid for a generic form of SMC inference; (iii) as a result of (ii), our calculus benefits from classic results from the SMC literature: a law of large numbers and an unbiased estimate of the model evidence; and (iv) we formalize the bootstrap particle filter for the calculus and discuss how our results can be further extended to other SMC algorithms.

Download Full-text

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Communications Biology ◽

10.1038/s42003-021-01753-7 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Fredrik Ronquist ◽

Jan Kudlicka ◽

Viktor Senderov ◽

Johannes Borgström ◽

Nicolas Lartillot ◽

...

Keyword(s):

Programming Languages ◽

Graphical Models ◽

Sequential Monte Carlo ◽

Full Range ◽

Efficient Estimation ◽

Probabilistic Programming ◽

Automated Generation ◽

Inference Algorithms ◽

Powerful Approach ◽

Inference Strategy

AbstractStatistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Download Full-text

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

10.1101/2020.06.16.154443 ◽

2020 ◽

Cited By ~ 1

Author(s):

Fredrik Ronquist ◽

Jan Kudlicka ◽

Viktor Senderov ◽

Johannes Borgström ◽

Nicolas Lartillot ◽

...

Keyword(s):

Programming Languages ◽

Graphical Models ◽

Sequential Monte Carlo ◽

Full Range ◽

Efficient Estimation ◽

Probabilistic Programming ◽

Automated Generation ◽

Inference Algorithms ◽

Powerful Approach ◽

Inference Strategy

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Download Full-text

Probabilistic programming in Python using PyMC3

PeerJ Computer Science ◽

10.7717/peerj-cs.55 ◽

2016 ◽

Vol 2 ◽

pp. e55 ◽

Cited By ~ 510

Author(s):

John Salvatier ◽

Thomas V. Wiecki ◽

Christopher Fonnesbeck

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Probabilistic Models ◽

Automatic Differentiation ◽

Direct Interaction ◽

Model Specification ◽

Probabilistic Programming ◽

Domain Specific ◽

Complex Models ◽

Probabilistic Programs

Probabilistic programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamiltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source probabilistic programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other probabilistic programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package.

Download Full-text

Probabilistic programming in Python using PyMC3

10.7287/peerj.preprints.1686v1 ◽

2016 ◽

Cited By ~ 7

Author(s):

John Salvatier ◽

Thomas V Wiecki ◽

Christopher Fonnesbeck

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Probabilistic Models ◽

Automatic Differentiation ◽

Direct Interaction ◽

Model Specification ◽

Probabilistic Programming ◽

Domain Specific ◽

Complex Models ◽

Probabilistic Programs

Probabilistic Programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamliltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source Probabilistic Programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other Probabilistic Programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package.

Download Full-text

Probabilistic programming in Python using PyMC3

10.7287/peerj.preprints.1686 ◽

2016 ◽

Cited By ~ 8

Author(s):

John Salvatier ◽

Thomas V Wiecki ◽

Christopher Fonnesbeck

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Probabilistic Models ◽

Automatic Differentiation ◽

Direct Interaction ◽

Model Specification ◽

Probabilistic Programming ◽

Domain Specific ◽

Complex Models ◽

Probabilistic Programs

Probabilistic Programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamliltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source Probabilistic Programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other Probabilistic Programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package.

Download Full-text

Bayesian Estimation of DSGE Models

10.23943/princeton/9780691161082.001.0001 ◽

2015 ◽

Cited By ~ 45

Author(s):

Edward P. Herbst ◽

Frank Schorfheide

Keyword(s):

Monte Carlo ◽

Sequential Monte Carlo ◽

Likelihood Function ◽

Academic Research ◽

Dynamic Stochastic General Equilibrium ◽

Computational Techniques ◽

Dsge Models ◽

Monte Carlo Techniques ◽

Dynamic Stochastic ◽

Theoretical Foundations

Dynamic stochastic general equilibrium (DSGE) models have become one of the workhorses of modern macroeconomics and are extensively used for academic research as well as forecasting and policy analysis at central banks. This book introduces readers to state-of-the-art computational techniques used in the Bayesian analysis of DSGE models. The book covers Markov chain Monte Carlo techniques for linearized DSGE models, novel sequential Monte Carlo methods that can be used for parameter inference, and the estimation of nonlinear DSGE models based on particle filter approximations of the likelihood function. The theoretical foundations of the algorithms are discussed in depth, and detailed empirical applications and numerical illustrations are provided. The book also gives invaluable advice on how to tailor these algorithms to specific applications and assess the accuracy and reliability of the computations. The book is essential reading for graduate students, academic researchers, and practitioners at policy institutions.

Download Full-text