Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages

Programming Languages and Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-030-72019-3_15 ◽

2021 ◽

pp. 404-431

Author(s):

Daniel Lundén ◽

Johannes Borgström ◽

David Broman

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Sequential Monte Carlo ◽

Fundamental Problem ◽

Operational Semantics ◽

Correctness Proof ◽

Probabilistic Programming ◽

Inference Problems ◽

Large Numbers ◽

Open Question

AbstractProbabilistic programming is an approach to reasoning under uncertainty by encoding inference problems as programs. In order to solve these inference problems, probabilistic programming languages (PPLs) employ different inference algorithms, such as sequential Monte Carlo (SMC), Markov chain Monte Carlo (MCMC), or variational methods. Existing research on such algorithms mainly concerns their implementation and efficiency, rather than the correctness of the algorithms themselves when applied in the context of expressive PPLs. To remedy this, we give a correctness proof for SMC methods in the context of an expressive PPL calculus, representative of popular PPLs such as WebPPL, Anglican, and Birch. Previous work have studied correctness of MCMC using an operational semantics, and correctness of SMC and MCMC in a denotational setting without term recursion. However, for SMC inference—one of the most commonly used algorithms in PPLs as of today—no formal correctness proof exists in an operational setting. In particular, an open question is if the resample locations in a probabilistic program affects the correctness of SMC. We solve this fundamental problem, and make four novel contributions: (i) we extend an untyped PPL lambda calculus and operational semantics to include explicit resample terms, expressing synchronization points in SMC inference; (ii) we prove, for the first time, that subject to mild restrictions, any placement of the explicit resample terms is valid for a generic form of SMC inference; (iii) as a result of (ii), our calculus benefits from classic results from the SMC literature: a law of large numbers and an unbiased estimate of the model evidence; and (iv) we formalize the bootstrap particle filter for the calculus and discuss how our results can be further extended to other SMC algorithms.

Download Full-text

Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages

10.26226/morressier.604907f41a80aac83ca25d23 ◽

2021 ◽

Cited By ~ 1

Author(s):

Daniel Lundén ◽

Johannes Borgström ◽

David Broman

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Sequential Monte Carlo ◽

Probabilistic Programming

Download Full-text

Biological Network Inference with GRASP: A Bayesian Network Structure Learning Method Using Adaptive Sequential Monte Carlo

10.21203/rs.3.rs-148701/v1 ◽

2021 ◽

Author(s):

Kaixian Yu ◽

Zihan Cui ◽

Xin Sui ◽

Xing Qiu ◽

Jinfeng Zhang

Keyword(s):

Monte Carlo ◽

Network Structure ◽

Network Inference ◽

Sequential Monte Carlo ◽

Structure Learning ◽

Biomedical Science ◽

Optimal Network ◽

Genomic Studies ◽

Complex Correlation ◽

Open Question

Abstract Bayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex correlation structures. BNs have wide applications in many disciplines, including biology, social science, finance and biomedical science. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC)-based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the quality and diversity of sampled networks which were further improved by a third stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.

Download Full-text

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Communications Biology ◽

10.1038/s42003-021-01753-7 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Fredrik Ronquist ◽

Jan Kudlicka ◽

Viktor Senderov ◽

Johannes Borgström ◽

Nicolas Lartillot ◽

...

Keyword(s):

Programming Languages ◽

Graphical Models ◽

Sequential Monte Carlo ◽

Full Range ◽

Efficient Estimation ◽

Probabilistic Programming ◽

Automated Generation ◽

Inference Algorithms ◽

Powerful Approach ◽

Inference Strategy

AbstractStatistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Download Full-text

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

10.1101/2020.06.16.154443 ◽

2020 ◽

Cited By ~ 1

Author(s):

Fredrik Ronquist ◽

Jan Kudlicka ◽

Viktor Senderov ◽

Johannes Borgström ◽

Nicolas Lartillot ◽

...

Keyword(s):

Programming Languages ◽

Graphical Models ◽

Sequential Monte Carlo ◽

Full Range ◽

Efficient Estimation ◽

Probabilistic Programming ◽

Automated Generation ◽

Inference Algorithms ◽

Powerful Approach ◽

Inference Strategy

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Download Full-text

Probabilistic programming in Python using PyMC3

PeerJ Computer Science ◽

10.7717/peerj-cs.55 ◽

2016 ◽

Vol 2 ◽

pp. e55 ◽

Cited By ~ 510

Author(s):

John Salvatier ◽

Thomas V. Wiecki ◽

Christopher Fonnesbeck

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Probabilistic Models ◽

Automatic Differentiation ◽

Direct Interaction ◽

Model Specification ◽

Probabilistic Programming ◽

Domain Specific ◽

Complex Models ◽

Probabilistic Programs

Probabilistic programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamiltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source probabilistic programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other probabilistic programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package.

Download Full-text

Limit theorems for sequential MCMC methods

Advances in Applied Probability ◽

10.1017/apr.2020.9 ◽

2020 ◽

Vol 52 (2) ◽

pp. 377-403 ◽

Cited By ~ 2

Author(s):

Axel Finke ◽

Arnaud Doucet ◽

Adam M. Johansen

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Sequential Monte Carlo ◽

Probability Distributions ◽

Asymptotic Variance ◽

Mcmc Methods ◽

Time Step ◽

Strong Law ◽

Large Numbers

AbstractBoth sequential Monte Carlo (SMC) methods (a.k.a. ‘particle filters’) and sequential Markov chain Monte Carlo (sequential MCMC) methods constitute classes of algorithms which can be used to approximate expectations with respect to (a sequence of) probability distributions and their normalising constants. While SMC methods sample particles conditionally independently at each time step, sequential MCMC methods sample particles according to a Markov chain Monte Carlo (MCMC) kernel. Introduced over twenty years ago in [6], sequential MCMC methods have attracted renewed interest recently as they empirically outperform SMC methods in some applications. We establish an $\mathbb{L}_r$ -inequality (which implies a strong law of large numbers) and a central limit theorem for sequential MCMC methods and provide conditions under which errors can be controlled uniformly in time. In the context of state-space models, we also provide conditions under which sequential MCMC methods can indeed outperform standard SMC methods in terms of asymptotic variance of the corresponding Monte Carlo estimators.

Download Full-text

Biological Network Inference With GRASP: A Bayesian Network Structure Learning Method Using Adaptive Sequential Monte Carlo

Frontiers in Genetics ◽

10.3389/fgene.2021.764020 ◽

2021 ◽

Vol 12 ◽

Author(s):

Kaixian Yu ◽

Zihan Cui ◽

Xin Sui ◽

Xing Qiu ◽

Jinfeng Zhang

Keyword(s):

Monte Carlo ◽

Network Structure ◽

Network Inference ◽

Sequential Monte Carlo ◽

Structure Learning ◽

Biomedical Science ◽

Optimal Network ◽

Genomic Studies ◽

Complex Correlation ◽

Open Question

Bayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex correlation structures. BNs have wide applications in many disciplines, including biology, social science, finance and biomedical science. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC)-based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the quality and diversity of sampled networks which were further improved by a third stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.

Download Full-text

Probabilistic programming in Python using PyMC3

10.7287/peerj.preprints.1686v1 ◽

2016 ◽

Cited By ~ 7

Author(s):

John Salvatier ◽

Thomas V Wiecki ◽

Christopher Fonnesbeck

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Probabilistic Models ◽

Automatic Differentiation ◽

Direct Interaction ◽

Model Specification ◽

Probabilistic Programming ◽

Domain Specific ◽

Complex Models ◽

Probabilistic Programs

Probabilistic Programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamliltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source Probabilistic Programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other Probabilistic Programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package.

Download Full-text

GRASP: a Bayesian network structure learning method using adaptive sequential Monte Carlo

10.1101/767327 ◽

2019 ◽

Author(s):

Kaixian Yu ◽

Zihan Cui ◽

Xing Qiu ◽

Jinfeng Zhang

Keyword(s):

Monte Carlo ◽

Network Structure ◽

Biological Networks ◽

Missing Values ◽

Sequential Monte Carlo ◽

Structure Learning ◽

Heterogeneous Data ◽

Optimal Network ◽

Genomic Studies ◽

Open Question

AbstractBayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex dependence structures. BNs can be used to infer complex biological networks using heterogeneous data from different sources with missing values. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC) based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the diversity of sampled networks which were further improved by a new stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.

Download Full-text

Probabilistic programming in Python using PyMC3

10.7287/peerj.preprints.1686 ◽

2016 ◽

Cited By ~ 8

Author(s):

John Salvatier ◽

Thomas V Wiecki ◽

Christopher Fonnesbeck

Keyword(s):

Monte Carlo ◽

Programming Languages ◽

Probabilistic Models ◽

Automatic Differentiation ◽

Direct Interaction ◽

Model Specification ◽

Probabilistic Programming ◽

Domain Specific ◽

Complex Models ◽

Probabilistic Programs

Download Full-text