scholarly journals Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem

2018 ◽  
Vol 1 (1) ◽  
pp. 13-37 ◽  
Author(s):  
Shun-ichi Amari ◽  
Ryo Karakida ◽  
Masafumi Oizumi
2019 ◽  
Vol 31 (5) ◽  
pp. 827-848 ◽  
Author(s):  
Shun-ichi Amari ◽  
Ryo Karakida ◽  
Masafumi Oizumi ◽  
Marco Cuturi

We propose a new divergence on the manifold of probability distributions, building on the entropic regularization of optimal transportation problems. As Cuturi ( 2013 ) showed, regularizing the optimal transport problem with an entropic term is known to bring several computational benefits. However, because of that regularization, the resulting approximation of the optimal transport cost does not define a proper distance or divergence between probability distributions. We recently tried to introduce a family of divergences connecting the Wasserstein distance and the Kullback-Leibler divergence from an information geometry point of view (see Amari, Karakida, & Oizumi, 2018 ). However, that proposal was not able to retain key intuitive aspects of the Wasserstein geometry, such as translation invariance, which plays a key role when used in the more general problem of computing optimal transport barycenters. The divergence we propose in this work is able to retain such properties and admits an intuitive interpretation.


Entropy ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. 713 ◽  
Author(s):  
Frank Nielsen

We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis entropy related to the conformal flattening of the Fisher-Rao geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman Voronoi diagram is the Euclidean Voronoi diagram and the dual Bregman Voronoi diagram coincides with the Cauchy hyperbolic Voronoi diagram. In addition, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families.


2002 ◽  
Vol 14 (10) ◽  
pp. 2269-2316 ◽  
Author(s):  
Hiroyuki Nakahara ◽  
Shun-ichi Amari

This study introduces information-geometric measures to analyze neural firing patterns by taking not only the second-order but also higher-order interactions among neurons into account. Information geometry provides useful tools and concepts for this purpose, including the orthogonality of coordinate parameters and the Pythagoras relation in the Kullback-Leibler divergence. Based on this orthogonality, we show a novel method for analyzing spike firing patterns by decomposing the interactions of neurons of various orders. As a result, purely pairwise, triple-wise, and higher-order interactions are singled out. We also demonstrate the benefits of our proposal by using several examples.


2010 ◽  
Vol 58 (1) ◽  
pp. 183-195 ◽  
Author(s):  
S. Amari ◽  
A. Cichocki

Information geometry of divergence functionsMeasures of divergence between two points play a key role in many engineering problems. One such measure is a distance function, but there are many important measures which do not satisfy the properties of the distance. The Bregman divergence, Kullback-Leibler divergence andf-divergence are such measures. In the present article, we study the differential-geometrical structure of a manifold induced by a divergence function. It consists of a Riemannian metric, and a pair of dually coupled affine connections, which are studied in information geometry. The class of Bregman divergences are characterized by a dually flat structure, which is originated from the Legendre duality. A dually flat space admits a generalized Pythagorean theorem. The class off-divergences, defined on a manifold of probability distributions, is characterized by information monotonicity, and the Kullback-Leibler divergence belongs to the intersection of both classes. Thef-divergence always gives the α-geometry, which consists of the Fisher information metric and a dual pair of ±α-connections. The α-divergence is a special class off-divergences. This is unique, sitting at the intersection of thef-divergence and Bregman divergence classes in a manifold of positive measures. The geometry derived from the Tsallisq-entropy and related divergences are also addressed.


2019 ◽  
Vol 51 (01) ◽  
pp. 136-167 ◽  
Author(s):  
Stephan Eckstein

AbstractWe consider discrete-time Markov chains with Polish state space. The large deviations principle for empirical measures of a Markov chain can equivalently be stated in Laplace principle form, which builds on the convex dual pair of relative entropy (or Kullback– Leibler divergence) and cumulant generating functional f ↦ ln ʃ exp (f). Following the approach by Lacker (2016) in the independent and identically distributed case, we generalize the Laplace principle to a greater class of convex dual pairs. We present in depth one application arising from this extension, which includes large deviation results and a weak law of large numbers for certain robust Markov chains—similar to Markov set chains—where we model robustness via the first Wasserstein distance. The setting and proof of the extended Laplace principle are based on the weak convergence approach to large deviations by Dupuis and Ellis (2011).


Entropy ◽  
2018 ◽  
Vol 20 (9) ◽  
pp. 647 ◽  
Author(s):  
Stefano Gattone ◽  
Angela De Sanctis ◽  
Stéphane Puechmorel ◽  
Florence Nicol

In this paper, the problem of clustering rotationally invariant shapes is studied and a solution using Information Geometry tools is provided. Landmarks of a complex shape are defined as probability densities in a statistical manifold. Then, in the setting of shapes clustering through a K-means algorithm, the discriminative power of two different shapes distances are evaluated. The first, derived from Fisher–Rao metric, is related with the minimization of information in the Fisher sense and the other is derived from the Wasserstein distance which measures the minimal transportation cost. A modification of the K-means algorithm is also proposed which allows the variances to vary not only among the landmarks but also among the clusters.


2020 ◽  
Vol 34 (04) ◽  
pp. 4658-4666
Author(s):  
Shengxi Li ◽  
Zeyang Yu ◽  
Min Xiang ◽  
Danilo Mandic

We address the estimation problem for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Compared to the widely adopted Kullback–Leibler divergence, we show that the Wasserstein distance provides a more desirable optimisation space. We thus provide a stable solution to the EMMs that is both robust to initialisations and reaches a superior optimum by adaptively optimising along a manifold of an approximate Wasserstein distance. To this end, we first provide a unifying account of computable and identifiable EMMs, which serves as a basis to rigorously address the underpinning optimisation problem. Due to a probability constraint, solving this problem is extremely cumbersome and unstable, especially under the Wasserstein distance. To relieve this issue, we introduce an efficient optimisation method on a statistical manifold defined under an approximate Wasserstein distance, which allows for explicit metrics and computable operations, thus significantly stabilising and improving the EMM estimation. We further propose an adaptive method to accelerate the convergence. Experimental results demonstrate the excellent performance of the proposed EMM solver.


Author(s):  
Masaki Kobayashi

Information geometry is one of the most effective tools to investigate stochastic learning models. In it, stochastic learning models are regarded as manifolds in the view of differential geometry. Amari applied it to Boltzmann Machines, which is one of the stochastic learning models. The purpose of this chapter is to apply information geometry to complex-valued Boltzmann Machines. First, we construct the complex-valued Boltzmann Machines. Next, the author describes information geometry. The author will know some important notions of information geometry, exponential families, mixture families, Kullback-Leibler divergence, connections, geodesics, Fisher metrics, potential functions and so on. Finally, they apply information geometry to complex-valued Boltzmann Machines. They will investigate the structure of complex-valued Boltzmann manifold and know the notions of the connections and Fisher metric. Moreover we will get an effective learning algorithm, what is called em algorithm, for complex-valued Boltzmann machines with hidden neurons.


Optimization ◽  
1976 ◽  
Vol 7 (3) ◽  
pp. 395-403
Author(s):  
H.L. Bhatia ◽  
Kanti Swarup ◽  
M.C. Puri

Sign in / Sign up

Export Citation Format

Share Document