Geometric Variational Inference

Philipp Frank; Reimar Leike; Torsten A. Enßlin

doi:10.3390/e23070853

Geometric Variational Inference

Entropy ◽

10.3390/e23070853 ◽

2021 ◽

Vol 23 (7) ◽

pp. 853

Author(s):

Philipp Frank ◽

Reimar Leike ◽

Torsten A. Enßlin

Keyword(s):

Probability Distributions ◽

Variational Inference ◽

Mcmc Methods ◽

Variational Approximation ◽

Bayesian Inverse Problems ◽

Fisher Information Metric ◽

Non Linear ◽

Information Metric ◽

Low Dimensional ◽

Algorithmic Structure

Efficiently accessing the information contained in non-linear and high dimensional probability distributions remains a core challenge in modern statistics. Traditionally, estimators that go beyond point estimates are either categorized as Variational Inference (VI) or Markov-Chain Monte-Carlo (MCMC) techniques. While MCMC methods that utilize the geometric properties of continuous probability distributions to increase their efficiency have been proposed, VI methods rarely use the geometry. This work aims to fill this gap and proposes geometric Variational Inference (geoVI), a method based on Riemannian geometry and the Fisher information metric. It is used to construct a coordinate transformation that relates the Riemannian manifold associated with the metric to Euclidean space. The distribution, expressed in the coordinate system induced by the transformation, takes a particularly simple form that allows for an accurate variational approximation by a normal distribution. Furthermore, the algorithmic structure allows for an efficient implementation of geoVI which is demonstrated on multiple examples, ranging from low-dimensional illustrative ones to non-linear, hierarchical Bayesian inverse problems in thousands of dimensions.

Get full-text (via PubEx)

Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics

10.1101/702944 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mathieu Fourment ◽

Aaron E. Darling

Keyword(s):

Probabilistic Models ◽

Probability Distributions ◽

Mean Field ◽

Black Box ◽

Variational Inference ◽

Machine Learning Techniques ◽

Mcmc Methods ◽

Substitution Model ◽

Probabilistic Programming ◽

Phylogenetic Models

AbstractRecent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible (GTR) substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes-Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.

Get full-text (via PubEx)

Information geometry of divergence functions

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/v10175-010-0019-1 ◽

2010 ◽

Vol 58 (1) ◽

pp. 183-195 ◽

Cited By ~ 34

Author(s):

S. Amari ◽

A. Cichocki

Keyword(s):

Probability Distributions ◽

Flat Space ◽

Information Geometry ◽

Dual Pair ◽

Bregman Divergence ◽

Bregman Divergences ◽

Leibler Divergence ◽

Fisher Information Metric ◽

Information Metric ◽

Dually Flat Space

Information geometry of divergence functionsMeasures of divergence between two points play a key role in many engineering problems. One such measure is a distance function, but there are many important measures which do not satisfy the properties of the distance. The Bregman divergence, Kullback-Leibler divergence andf-divergence are such measures. In the present article, we study the differential-geometrical structure of a manifold induced by a divergence function. It consists of a Riemannian metric, and a pair of dually coupled affine connections, which are studied in information geometry. The class of Bregman divergences are characterized by a dually flat structure, which is originated from the Legendre duality. A dually flat space admits a generalized Pythagorean theorem. The class off-divergences, defined on a manifold of probability distributions, is characterized by information monotonicity, and the Kullback-Leibler divergence belongs to the intersection of both classes. Thef-divergence always gives the α-geometry, which consists of the Fisher information metric and a dual pair of ±α-connections. The α-divergence is a special class off-divergences. This is unique, sitting at the intersection of thef-divergence and Bregman divergence classes in a manifold of positive measures. The geometry derived from the Tsallisq-entropy and related divergences are also addressed.

Get full-text (via PubEx)

Robust Bayesian Sequential Input Shaping for Optimal Li-Ion Battery Model Parameter Identifiability

Volume 2: Diagnostics and Detection; Drilling; Dynamics and Control of Wind Energy Systems; Energy Harvesting; Estimation and Identification; Flexible and Smart Structure Control; Fuels Cells/Energy Storage; Human Robot Interaction; HVAC Building Energy Management; Industrial Applications; Intelligent Transportation Systems; Manufacturing; Mechatronics; Modelling and Validation; Motion and Vibration Control Applications ◽

10.1115/dscc2015-9942 ◽

2015 ◽

Cited By ~ 1

Author(s):

Michael J. Rothenberger ◽

Hosam K. Fathy

Keyword(s):

Trajectory Optimization ◽

Probability Distributions ◽

Parameter Distribution ◽

Parameter Uncertainties ◽

Parameter Identifiability ◽

Time Simulation ◽

Fisher Information Metric ◽

Nominal Parameter ◽

Information Metric ◽

Parameter Values

This paper examines the challenge of shaping a battery’s input trajectory to (i) maximize its Fisher parameter identifiability while (ii) achieving robustness to parameter uncertainties. The paper is motivated by earlier research showing that the speed and accuracy with which battery parameters can be estimated both improve significantly when battery inputs are optimized for Fisher identifiability. Previous research performs this trajectory optimization for a known nominal parameter set. This creates a tautology where accurate parameter identification is a prerequisite for Fisher identifiability optimization. In contrast, this paper presents an iterative scheme that: (i) uses prior parameter probability distributions to create a weighted Fisher metric; (ii) optimizes the battery input trajectory for this metric using a genetic algorithm; (iii) applies the resulting input trajectory to the battery; (iv) estimates battery parameters using a Bayesian particle filter; (v) re-computes the weighted Fisher information metric using the resulting posterior parameter distribution; and (vi) repeats this process until convergence. This approach builds on well-established ideas from the estimation literature, and applies them to the battery domain for the first time. Simulation studies highlight the ability of this iterative algorithm to converge quickly towards the correct battery parameter values, despite large initial parameter uncertainties.

Get full-text (via PubEx)

Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics

PeerJ ◽

10.7717/peerj.8272 ◽

2019 ◽

Vol 7 ◽

pp. e8272 ◽

Cited By ~ 1

Author(s):

Mathieu Fourment ◽

Aaron E. Darling

Keyword(s):

Probabilistic Models ◽

Probability Distributions ◽

Mean Field ◽

Black Box ◽

Variational Inference ◽

Machine Learning Techniques ◽

Mcmc Methods ◽

Substitution Model ◽

Probabilistic Programming ◽

Phylogenetic Models

Recent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes–Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.

Get full-text (via PubEx)

SAR Target Recognition via Meta-Learning and Amortized Variational Inference

Sensors ◽

10.3390/s20205966 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5966

Author(s):

Ke Wang ◽

Gong Zhang

Keyword(s):

Target Recognition ◽

Probability Distributions ◽

Automatic Target Recognition ◽

Variational Inference ◽

Training Data ◽

Superior Performance ◽

Small Data ◽

Meta Learning ◽

Radar Automatic Target Recognition ◽

Global Parameters

The challenge of small data has emerged in synthetic aperture radar automatic target recognition (SAR-ATR) problems. Most SAR-ATR methods are data-driven and require a lot of training data that are expensive to collect. To address this challenge, we propose a recognition model that incorporates meta-learning and amortized variational inference (AVI). Specifically, the model consists of global parameters and task-specific parameters. The global parameters, trained by meta-learning, construct a common feature extractor shared between all recognition tasks. The task-specific parameters, modeled by probability distributions, can adapt to new tasks with a small amount of training data. To reduce the computation and storage cost, the task-specific parameters are inferred by AVI implemented with set-to-set functions. Extensive experiments were conducted on a real SAR dataset to evaluate the effectiveness of the model. The results of the proposed approach compared with those of the latest SAR-ATR methods show the superior performance of our model, especially on recognition tasks with limited data.

Get full-text (via PubEx)

Identifying graph clusters using variational inference and links to covariance parametrization

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2009.0117 ◽

2009 ◽

Vol 367 (1906) ◽

pp. 4407-4426

Author(s):

David Barber

Keyword(s):

Mean Field ◽

Matrix Decomposition ◽

Difficult Problem ◽

Positive Definite ◽

Variational Inference ◽

Field Theories ◽

Variational Approximation ◽

Positive Definite Matrices ◽

The Matrix ◽

Non Gaussian

Finding clusters of well-connected nodes in a graph is a problem common to many domains, including social networks, the Internet and bioinformatics. From a computational viewpoint, finding these clusters or graph communities is a difficult problem. We use a clique matrix decomposition based on a statistical description that encourages clusters to be well connected and few in number. The formal intractability of inferring the clusters is addressed using a variational approximation inspired by mean-field theories in statistical mechanics. Clique matrices also play a natural role in parametrizing positive definite matrices under zero constraints on elements of the matrix. We show that clique matrices can parametrize all positive definite matrices restricted according to a decomposable graph and form a structured factor analysis approximation in the non-decomposable case. Extensions to conjugate Bayesian covariance priors and more general non-Gaussian independence models are briefly discussed.

Get full-text (via PubEx)

Ensemble variational assimilation as a probabilistic estimator – Part 1: The linear and weak non-linear case

Nonlinear Processes in Geophysics ◽

10.5194/npg-25-565-2018 ◽

2018 ◽

Vol 25 (3) ◽

pp. 565-587 ◽

Cited By ~ 2

Author(s):

Mohamed Jardak ◽

Olivier Talagrand

Keyword(s):

Probability Distribution ◽

Simple Procedure ◽

Linear Case ◽

Bayesian Probability ◽

Variational Assimilation ◽

Non Linear ◽

Abstract Data ◽

Gaussian Case ◽

Low Dimensional ◽

Lorenz 96

Abstract. Data assimilation is considered as a problem in Bayesian estimation, viz. determine the probability distribution for the state of the observed system, conditioned by the available data. In the linear and additive Gaussian case, a Monte Carlo sample of the Bayesian probability distribution (which is Gaussian and known explicitly) can be obtained by a simple procedure: perturb the data according to the probability distribution of their own errors, and perform an assimilation on the perturbed data. The performance of that approach, called here ensemble variational assimilation (EnsVAR), also known as ensemble of data assimilations (EDA), is studied in this two-part paper on the non-linear low-dimensional Lorenz-96 chaotic system, with the assimilation being performed by the standard variational procedure. In this first part, EnsVAR is implemented first, for reference, in a linear and Gaussian case, and then in a weakly non-linear case (assimilation over 5 days of the system). The performances of the algorithm, considered either as a probabilistic or a deterministic estimator, are very similar in the two cases. Additional comparison shows that the performance of EnsVAR is better, both in the assimilation and forecast phases, than that of standard algorithms for the ensemble Kalman filter (EnKF) and particle filter (PF), although at a higher cost. Globally similar results are obtained with the Kuramoto–Sivashinsky (K–S) equation.

Get full-text (via PubEx)

Basic Optical Properties of Low Dimensional Structures for Applications to Lasers, Electro-Optic and Non-Linear Optical Devices

Low-Dimensional Structures in Semiconductors - NATO ASI Series ◽

10.1007/978-1-4899-0623-6_11 ◽

1991 ◽

pp. 165-200

Author(s):

C. Weisbuch

Keyword(s):

Optical Properties ◽

Optical Devices ◽

Electro Optic ◽

Non Linear ◽

Linear Optical ◽

Low Dimensional

Get full-text (via PubEx)

The Role of Modal Coupling on the Non-Linear Response of Cylindrical Shells Subjected to Dynamic Axial Loads

10.1115/imece2000-1010 ◽

2000 ◽

Author(s):

Paulo B. Gonçalves ◽

Zenón J. G. N. Del Prado

Keyword(s):

Dynamic Instability ◽

Equations Of Motion ◽

Cylindrical Shells ◽

Parametric Instability ◽

Compressive Load ◽

Basins Of Attraction ◽

Dynamic Component ◽

Linear Ordinary Differential Equations ◽

Non Linear ◽

Low Dimensional

Abstract This paper discusses the dynamic instability of circular cylindrical shells subjected to time-dependent axial edge loads of the form P(t) = P0+P1(t), where the dynamic component p1(t) is periodic in time and P0 is a uniform compressive load. In the present paper a low dimensional model, which retains the essential non-linear terms, is used to study the non-linear oscillations and instabilities of the shell. For this, Donnell’s shallow shell equations are used together with the Galerkin method to derive a set of coupled non-linear ordinary differential equations of motion which are, in turn, solved by the Runge-Kutta method. To study the non-linear behavior of the shell, several numerical strategies were used to obtain Poincaré maps, stable and unstable fixed points, bifurcation diagrams and basins of attraction. Particular attention is paid to two dynamic instability phenomena that may arise under these loading conditions: parametric instability and escape from the pre-buckling potential well. The numerical results obtained from this investigation clarify the conditions, which control whether or not instability may occur. This may help in establishing proper design criteria for these shells under dynamic loads, a topic practically unexplored in literature.

Get full-text (via PubEx)

Learning Manifolds

Machine Learning in Computer-Aided Diagnosis - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-4666-0059-1.ch018 ◽

2012 ◽

pp. 374-402

Author(s):

Diana Mateus ◽

Christian Wachinger ◽

Selen Atasoy ◽

Loren Schwarz ◽

Nassir Navab

Keyword(s):

Manifold Learning ◽

Domain Knowledge ◽

Dimensional Space ◽

Human Motion ◽

Motion Modeling ◽

Learning Methods ◽

Data Representations ◽

Non Linear ◽

Data Points ◽

Low Dimensional

Computer aided diagnosis is often confronted with processing and analyzing high dimensional data. One alternative to deal with such data is dimensionality reduction. This chapter focuses on manifold learning methods to create low dimensional data representations adapted to a given application. From pairwise non-linear relations between neighboring data-points, manifold learning algorithms first approximate the low dimensional manifold where data lives with a graph; then, they find a non-linear map to embed this graph into a low dimensional space. Since the explicit pairwise relations and the neighborhood system can be designed according to the application, manifold learning methods are very flexible and allow easy incorporation of domain knowledge. The authors describe different assumptions and design elements that are crucial to building successful low dimensional data representations with manifold learning for a variety of applications. In particular, they discuss examples for visualization, clustering, classification, registration, and human-motion modeling.

Get full-text (via PubEx)