Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics

Mapping Intimacies ◽

10.1101/702944 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mathieu Fourment ◽

Aaron E. Darling

Keyword(s):

Probabilistic Models ◽

Probability Distributions ◽

Mean Field ◽

Black Box ◽

Variational Inference ◽

Machine Learning Techniques ◽

Mcmc Methods ◽

Substitution Model ◽

Probabilistic Programming ◽

Phylogenetic Models

AbstractRecent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible (GTR) substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes-Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.

Download Full-text

Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics

PeerJ ◽

10.7717/peerj.8272 ◽

2019 ◽

Vol 7 ◽

pp. e8272 ◽

Cited By ~ 1

Author(s):

Mathieu Fourment ◽

Aaron E. Darling

Keyword(s):

Probabilistic Models ◽

Probability Distributions ◽

Mean Field ◽

Black Box ◽

Variational Inference ◽

Machine Learning Techniques ◽

Mcmc Methods ◽

Substitution Model ◽

Probabilistic Programming ◽

Phylogenetic Models

Recent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes–Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.

Download Full-text

Geometric Variational Inference

Entropy ◽

10.3390/e23070853 ◽

2021 ◽

Vol 23 (7) ◽

pp. 853

Author(s):

Philipp Frank ◽

Reimar Leike ◽

Torsten A. Enßlin

Keyword(s):

Probability Distributions ◽

Variational Inference ◽

Mcmc Methods ◽

Variational Approximation ◽

Bayesian Inverse Problems ◽

Fisher Information Metric ◽

Non Linear ◽

Information Metric ◽

Low Dimensional ◽

Algorithmic Structure

Efficiently accessing the information contained in non-linear and high dimensional probability distributions remains a core challenge in modern statistics. Traditionally, estimators that go beyond point estimates are either categorized as Variational Inference (VI) or Markov-Chain Monte-Carlo (MCMC) techniques. While MCMC methods that utilize the geometric properties of continuous probability distributions to increase their efficiency have been proposed, VI methods rarely use the geometry. This work aims to fill this gap and proposes geometric Variational Inference (geoVI), a method based on Riemannian geometry and the Fisher information metric. It is used to construct a coordinate transformation that relates the Riemannian manifold associated with the metric to Euclidean space. The distribution, expressed in the coordinate system induced by the transformation, takes a particularly simple form that allows for an accurate variational approximation by a normal distribution. Furthermore, the algorithmic structure allows for an efficient implementation of geoVI which is demonstrated on multiple examples, ranging from low-dimensional illustrative ones to non-linear, hierarchical Bayesian inverse problems in thousands of dimensions.

Download Full-text

SAR Target Recognition via Meta-Learning and Amortized Variational Inference

Sensors ◽

10.3390/s20205966 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5966

Author(s):

Ke Wang ◽

Gong Zhang

Keyword(s):

Target Recognition ◽

Probability Distributions ◽

Automatic Target Recognition ◽

Variational Inference ◽

Training Data ◽

Superior Performance ◽

Small Data ◽

Meta Learning ◽

Radar Automatic Target Recognition ◽

Global Parameters

The challenge of small data has emerged in synthetic aperture radar automatic target recognition (SAR-ATR) problems. Most SAR-ATR methods are data-driven and require a lot of training data that are expensive to collect. To address this challenge, we propose a recognition model that incorporates meta-learning and amortized variational inference (AVI). Specifically, the model consists of global parameters and task-specific parameters. The global parameters, trained by meta-learning, construct a common feature extractor shared between all recognition tasks. The task-specific parameters, modeled by probability distributions, can adapt to new tasks with a small amount of training data. To reduce the computation and storage cost, the task-specific parameters are inferred by AVI implemented with set-to-set functions. Extensive experiments were conducted on a real SAR dataset to evaluate the effectiveness of the model. The results of the proposed approach compared with those of the latest SAR-ATR methods show the superior performance of our model, especially on recognition tasks with limited data.

Download Full-text

Identifying graph clusters using variational inference and links to covariance parametrization

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2009.0117 ◽

2009 ◽

Vol 367 (1906) ◽

pp. 4407-4426

Author(s):

David Barber

Keyword(s):

Mean Field ◽

Matrix Decomposition ◽

Difficult Problem ◽

Positive Definite ◽

Variational Inference ◽

Field Theories ◽

Variational Approximation ◽

Positive Definite Matrices ◽

The Matrix ◽

Non Gaussian

Finding clusters of well-connected nodes in a graph is a problem common to many domains, including social networks, the Internet and bioinformatics. From a computational viewpoint, finding these clusters or graph communities is a difficult problem. We use a clique matrix decomposition based on a statistical description that encourages clusters to be well connected and few in number. The formal intractability of inferring the clusters is addressed using a variational approximation inspired by mean-field theories in statistical mechanics. Clique matrices also play a natural role in parametrizing positive definite matrices under zero constraints on elements of the matrix. We show that clique matrices can parametrize all positive definite matrices restricted according to a decomposable graph and form a structured factor analysis approximation in the non-decomposable case. Extensions to conjugate Bayesian covariance priors and more general non-Gaussian independence models are briefly discussed.

Download Full-text

STOCHASTICALLY SCALABLE FLOW CONTROL

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964809990076 ◽

2009 ◽

Vol 23 (4) ◽

pp. 675-698 ◽

Cited By ~ 1

Author(s):

Thomas Voice

Keyword(s):

Flow Control ◽

Probabilistic Models ◽

Mean Field ◽

Parameter Choice ◽

Modeling Framework ◽

Flow Rates ◽

Coefficients Of Variation ◽

Scalable Tcp ◽

Exponential Red ◽

The Mean

Recent advances in the mathematical analysis of flow control have prompted the creation of the Scalable TCP (STCP) and Exponential RED (E-RED) algorithms. These are designed to be scalable under the popular deterministic delay stability modeling framework. In this article, we analyze stochastic models of STCP and STCP combined with E-RED link behavior. We find that under certain plausible network conditions, these probabilistic models also exhibit scalable behavior. In particular, we derive parameter choice schemes for which the equilibrium coefficients of variation of flow rates are bounded, however large, fast, or complex the network. Our model is shown to exhibit behavior similar to the mean field convergence that has recently been observed in TCP.

Download Full-text

Exploring multi-modalities in weather prediction using a univariate graph based on machine learning techniques

10.5194/egusphere-egu21-11747 ◽

2021 ◽

Author(s):

Natacha Galmiche ◽

Nello Blaser ◽

Morten Brun ◽

Helwig Hauser ◽

Thomas Spengler ◽

...

Keyword(s):

Machine Learning ◽

Standard Deviation ◽

Probability Distributions ◽

Weather Prediction ◽

A Priori ◽

Clustering Algorithms ◽

Quantitative Information ◽

Machine Learning Techniques ◽

Topological Data Analysis ◽

Learning Techniques

Probability distributions based on ensemble forecasts are commonly used to assess uncertainty in weather prediction. However, interpreting these distributions is not trivial, especially in the case of multimodality with distinct likely outcomes. The conventional summary employs mean and standard deviation across ensemble members, which works well for unimodal, Gaussian-like distributions. In the case of multimodality this misleads, discarding crucial information.&#160;We aim at combining previously developed clustering algorithms in machine learning and topological data analysis to extract useful information such as the number of clusters in an ensemble. Given the chaotic behaviour of the atmosphere, machine learning techniques can provide relevant results even if no, or very little, a priori information about the data is available. In addition, topological methods that analyse the shape of the data can make results explainable.Given an ensemble of univariate time series, a graph is generated whose edges and vertices represent clusters of members, including additional information for each cluster such as the members belonging to them, their uncertainty, and their relevance according to the graph. In the case of multimodality, this approach provides relevant and quantitative information beyond the commonly used mean and standard deviation approach that helps to further characterise the predictability.

Download Full-text

Optimal foraging and the information theory of gambling

Journal of The Royal Society Interface ◽

10.1098/rsif.2019.0162 ◽

2019 ◽

Vol 16 (157) ◽

pp. 20190162 ◽

Cited By ~ 4

Author(s):

Roland J. Baddeley ◽

Nigel R. Franks ◽

Edmund R. Hunt

Keyword(s):

Information Theory ◽

Monte Carlo ◽

Optimal Foraging ◽

Foraging Behaviour ◽

Stochastic Dynamics ◽

Probability Distributions ◽

Movement Ecology ◽

Mcmc Methods ◽

Long Run ◽

Resource Gradient

At a macroscopic level, part of the ant colony life cycle is simple: a colony collects resources; these resources are converted into more ants, and these ants in turn collect more resources. Because more ants collect more resources, this is a multiplicative process, and the expected logarithm of the amount of resources determines how successful the colony will be in the long run. Over 60 years ago, Kelly showed, using information theoretic techniques, that the rate of growth of resources for such a situation is optimized by a strategy of betting in proportion to the probability of pay-off. Thus, in the case of ants, the fraction of the colony foraging at a given location should be proportional to the probability that resources will be found there, a result widely applied in the mathematics of gambling. This theoretical optimum leads to predictions as to which collective ant movement strategies might have evolved. Here, we show how colony-level optimal foraging behaviour can be achieved by mapping movement to Markov chain Monte Carlo (MCMC) methods, specifically Hamiltonian Monte Carlo (HMC). This can be done by the ants following a (noisy) local measurement of the (logarithm of) resource probability gradient (possibly supplemented with momentum, i.e. a propensity to move in the same direction). This maps the problem of foraging (via the information theory of gambling, stochastic dynamics and techniques employed within Bayesian statistics to efficiently sample from probability distributions) to simple models of ant foraging behaviour. This identification has broad applicability, facilitates the application of information theory approaches to understand movement ecology and unifies insights from existing biomechanical, cognitive, random and optimality movement paradigms. At the cost of requiring ants to obtain (noisy) resource gradient information, we show that this model is both efficient and matches a number of characteristics of real ant exploration.

Download Full-text

Black-Box Marine Vehicle Identification with Regression Techniques for Random Manoeuvres

Electronics ◽

10.3390/electronics8050492 ◽

2019 ◽

Vol 8 (5) ◽

pp. 492 ◽

Cited By ~ 3

Author(s):

Raul Moreno ◽

David Moreno-Salinas ◽

Joaquin Aranda

Keyword(s):

System Identification ◽

Symbolic Regression ◽

Black Box ◽

Machine Learning Techniques ◽

Control Structures ◽

Marine Vehicles ◽

Building Models ◽

Learning Techniques ◽

Marine System ◽

Regression Techniques

As a critical step to efficiently design control structures, system identification is concerned with building models of dynamical systems from observed input–output data. In this paper, a number of regression techniques are used for black-box marine system identification of a scale ship. Unlike other works that train the models using specific manoeuvres, in this work the data have been collected from several random manoeuvres and trajectories. Therefore, the aim is to develop general and robust mathematical models using real experimental data from random movements. The techniques used in this work are ridge, kernel ridge and symbolic regression, and the results show that machine learning techniques are robust approaches to model surface marine vehicles, even providing interpretable results in closed form equations using techniques such as symbolic regression.

Download Full-text

Probabilistic Models Applicable to the Short-Term Extreme Response Analysis of Jack-Up Platforms

Journal of Offshore Mechanics and Arctic Engineering ◽

10.1115/1.1600470 ◽

2003 ◽

Vol 125 (4) ◽

pp. 249-263 ◽

Cited By ~ 1

Author(s):

M. J. Cassidy ◽

G. T. Houlsby ◽

R. Eatock Taylor

Keyword(s):

Probabilistic Models ◽

Probability Distributions ◽

Response Analysis ◽

Wave Loading ◽

Physical Processes ◽

Short Term ◽

Analysis Techniques ◽

Extreme Response ◽

Increasing Demand ◽

Jack Up

There is a steadily increasing demand for the use of jack-up units in deeper water and harsher conditions. Confidence in their use in these environments requires jack-up analysis techniques to reflect accurately the physical processes occurring. However, nearly all analyses are deterministic in nature and do not account for the inherent variability in governing parameters and models. In this paper, probabilistic models are used to develop an understanding of the response behavior of jack-ups, with particular emphasis placed on the extreme deck displacement due to a short-term event. Variables within the structural, foundation and wave loading models are assigned probability distributions and their influence on the response statistics is quantified using a response surface methodology.

Download Full-text

Conditional Independence by Typing

ACM Transactions on Programming Languages and Systems ◽

10.1145/3490421 ◽

2022 ◽

Vol 44 (1) ◽

pp. 1-54

Author(s):

Maria I. Gorinova ◽

Andrew D. Gordon ◽

Charles Sutton ◽

Matthijs Vákár

Keyword(s):

Programming Languages ◽

Conditional Independence ◽

Probabilistic Models ◽

Type System ◽

Type Inference ◽

Practical Application ◽

Probabilistic Programming ◽

Variable Elimination ◽

Gradient Based ◽

Inference Methods

A central goal of probabilistic programming languages (PPLs) is to separate modelling from inference. However, this goal is hard to achieve in practice. Users are often forced to re-write their models to improve efficiency of inference or meet restrictions imposed by the PPL. Conditional independence (CI) relationships among parameters are a crucial aspect of probabilistic models that capture a qualitative summary of the specified model and can facilitate more efficient inference. We present an information flow type system for probabilistic programming that captures conditional independence (CI) relationships and show that, for a well-typed program in our system, the distribution it implements is guaranteed to have certain CI-relationships. Further, by using type inference, we can statically deduce which CI-properties are present in a specified model. As a practical application, we consider the problem of how to perform inference on models with mixed discrete and continuous parameters. Inference on such models is challenging in many existing PPLs, but can be improved through a workaround, where the discrete parameters are used implicitly , at the expense of manual model re-writing. We present a source-to-source semantics-preserving transformation, which uses our CI-type system to automate this workaround by eliminating the discrete parameters from a probabilistic program. The resulting program can be seen as a hybrid inference algorithm on the original program, where continuous parameters can be drawn using efficient gradient-based inference methods, while the discrete parameters are inferred using variable elimination. We implement our CI-type system and its example application in SlicStan: a compositional variant of Stan. 1

Download Full-text