Information Geometry for Regularized Optimal Transport and Barycenters of Patterns

2019 ◽  
Vol 31 (5) ◽  
pp. 827-848 ◽  
Author(s):  
Shun-ichi Amari ◽  
Ryo Karakida ◽  
Masafumi Oizumi ◽  
Marco Cuturi

We propose a new divergence on the manifold of probability distributions, building on the entropic regularization of optimal transportation problems. As Cuturi ( 2013 ) showed, regularizing the optimal transport problem with an entropic term is known to bring several computational benefits. However, because of that regularization, the resulting approximation of the optimal transport cost does not define a proper distance or divergence between probability distributions. We recently tried to introduce a family of divergences connecting the Wasserstein distance and the Kullback-Leibler divergence from an information geometry point of view (see Amari, Karakida, & Oizumi, 2018 ). However, that proposal was not able to retain key intuitive aspects of the Wasserstein geometry, such as translation invariance, which plays a key role when used in the more general problem of computing optimal transport barycenters. The divergence we propose in this work is able to retain such properties and admits an intuitive interpretation.

2021 ◽  
Author(s):  
Jacob Atticus Armstrong Goodall

Abstract A duality theorem is stated and proved for a minimax vector optimization problem where the vectors are elements of the set of products of compact Polish spaces. A special case of this theorem is derived to show that two metrics on the space of probability distributions on countable products of Polish spaces are identical. The appendix includes a proof that, under the appropriate conditions, the function studied in the optimisation problem is indeed a metric. The optimisation problem is comparable to multi-commodity optimal transport where there is dependence between commodities. This paper builds on the work of R.S. MacKay who introduced the metrics in the context of complexity science in [4] and [5]. The metrics have the advantage of measuring distance uniformly over the whole network while other metrics on probability distributions fail to do so (e.g total variation, Kullback–Leibler divergence, see [5]). This opens up the potential of mathematical optimisation in the setting of complexity science.


2010 ◽  
Vol 58 (1) ◽  
pp. 183-195 ◽  
Author(s):  
S. Amari ◽  
A. Cichocki

Information geometry of divergence functionsMeasures of divergence between two points play a key role in many engineering problems. One such measure is a distance function, but there are many important measures which do not satisfy the properties of the distance. The Bregman divergence, Kullback-Leibler divergence andf-divergence are such measures. In the present article, we study the differential-geometrical structure of a manifold induced by a divergence function. It consists of a Riemannian metric, and a pair of dually coupled affine connections, which are studied in information geometry. The class of Bregman divergences are characterized by a dually flat structure, which is originated from the Legendre duality. A dually flat space admits a generalized Pythagorean theorem. The class off-divergences, defined on a manifold of probability distributions, is characterized by information monotonicity, and the Kullback-Leibler divergence belongs to the intersection of both classes. Thef-divergence always gives the α-geometry, which consists of the Fisher information metric and a dual pair of ±α-connections. The α-divergence is a special class off-divergences. This is unique, sitting at the intersection of thef-divergence and Bregman divergence classes in a manifold of positive measures. The geometry derived from the Tsallisq-entropy and related divergences are also addressed.


Author(s):  
Nhan Dam ◽  
Quan Hoang ◽  
Trung Le ◽  
Tu Dinh Nguyen ◽  
Hung Bui ◽  
...  

We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be trained reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.


Entropy ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. 302
Author(s):  
Qijun Tong ◽  
Kei Kobayashi

The distance and divergence of the probability measures play a central role in statistics, machine learning, and many other related fields. The Wasserstein distance has received much attention in recent years because of its distinctions from other distances or divergences. Although computing the Wasserstein distance is costly, entropy-regularized optimal transport was proposed to computationally efficiently approximate the Wasserstein distance. The purpose of this study is to understand the theoretical aspect of entropy-regularized optimal transport. In this paper, we focus on entropy-regularized optimal transport on multivariate normal distributions and q-normal distributions. We obtain the explicit form of the entropy-regularized optimal transport cost on multivariate normal and q-normal distributions; this provides a perspective to understand the effect of entropy regularization, which was previously known only experimentally. Furthermore, we obtain the entropy-regularized Kantorovich estimator for the probability measure that satisfies certain conditions. We also demonstrate how the Wasserstein distance, optimal coupling, geometric structure, and statistical efficiency are affected by entropy regularization in some experiments. In particular, our results about the explicit form of the optimal coupling of the Tsallis entropy-regularized optimal transport on multivariate q-normal distributions and the entropy-regularized Kantorovich estimator are novel and will become the first step towards the understanding of a more general setting.


Author(s):  
Ting-Kam Leonard Wong ◽  
Jiaowen Yang

AbstractOptimal transport and information geometry both study geometric structures on spaces of probability distributions. Optimal transport characterizes the cost-minimizing movement from one distribution to another, while information geometry originates from coordinate invariant properties of statistical inference. Their relations and applications in statistics and machine learning have started to gain more attention. In this paper we give a new differential-geometric relation between the two fields. Namely, the pseudo-Riemannian framework of Kim and McCann, which provides a geometric perspective on the fundamental Ma–Trudinger–Wang (MTW) condition in the regularity theory of optimal transport maps, encodes the dualistic structure of statistical manifold. This general relation is described using the framework of c-divergence under which divergences are defined by optimal transport maps. As a by-product, we obtain a new information-geometric interpretation of the MTW tensor on the graph of the transport map. This relation sheds light on old and new aspects of information geometry. The dually flat geometry of Bregman divergence corresponds to the quadratic cost and the pseudo-Euclidean space, and the logarithmic $$L^{(\alpha )}$$ L ( α ) -divergence introduced by Pal and the first author has constant sectional curvature in a sense to be made precise. In these cases we give a geometric interpretation of the information-geometric curvature in terms of the divergence between a primal-dual pair of geodesics.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A111-A112
Author(s):  
Austin Vandegriffe ◽  
V A Samaranayake ◽  
Matthew Thimgan

Abstract Introduction Technological innovations have broadened the type and amount of activity data that can be captured in the home and under normal living conditions. Yet, converting naturalistic activity patterns into sleep and wakefulness states has remained a challenge. Despite the successes of current algorithms, they do not fill all actigraphy needs. We have developed a novel statistical approach to determine sleep and wakefulness times, called the Wasserstein Algorithm for Classifying Sleep and Wakefulness (WACSAW), and validated the algorithm in a small cohort of healthy participants. Methods WACSAW functional routines: 1) Conversion of the triaxial movement data into a univariate time series; 2) Construction of a Wasserstein weighted sum (WSS) time series by measuring the Wasserstein distance between equidistant distributions of movement data before and after the time-point of interest; 3) Segmenting the time series by identifying changepoints based on the behavior of the WSS series; 4) Merging segments deemed similar by the Levene test; 5) Comparing segments by optimal transport methodology to determine the difference from a flat, invariant distribution at zero. The resulting histogram can be used to determine sleep and wakefulness parameters around a threshold determined for each individual based on histogram properties. To validate the algorithm, participants wore the GENEActiv and a commercial grade actigraphy watch for 48 hours. The accuracy of WACSAW was compared to a detailed activity log and benchmarked against the results of the output from commercial wrist actigraph. Results WACSAW performed with an average accuracy, sensitivity, and specificity of >95% compared to detailed activity logs in 10 healthy-sleeping individuals of mixed sexes and ages. We then compared WACSAW’s performance against a common wrist-worn, commercial sleep monitor. WACSAW outperformed the commercial grade system in each participant compared to activity logs and the variability between subjects was cut substantially. Conclusion The performance of WACSAW demonstrates good results in a small test cohort. In addition, WACSAW is 1) open-source, 2) individually adaptive, 3) indicates individual reliability, 4) based on the activity data stream, and 5) requires little human intervention. WACSAW is worthy of validating against polysomnography and in patients with sleep disorders to determine its overall effectiveness. Support (if any):


Entropy ◽  
2018 ◽  
Vol 20 (11) ◽  
pp. 813 ◽  
Author(s):  
José Amigó ◽  
Sámuel Balogh ◽  
Sergio Hernández

Entropy appears in many contexts (thermodynamics, statistical mechanics, information theory, measure-preserving dynamical systems, topological dynamics, etc.) as a measure of different properties (energy that cannot produce work, disorder, uncertainty, randomness, complexity, etc.). In this review, we focus on the so-called generalized entropies, which from a mathematical point of view are nonnegative functions defined on probability distributions that satisfy the first three Shannon–Khinchin axioms: continuity, maximality and expansibility. While these three axioms are expected to be satisfied by all macroscopic physical systems, the fourth axiom (separability or strong additivity) is in general violated by non-ergodic systems with long range forces, this having been the main reason for exploring weaker axiomatic settings. Currently, non-additive generalized entropies are being used also to study new phenomena in complex dynamics (multifractality), quantum systems (entanglement), soft sciences, and more. Besides going through the axiomatic framework, we review the characterization of generalized entropies via two scaling exponents introduced by Hanel and Thurner. In turn, the first of these exponents is related to the diffusion scaling exponent of diffusion processes, as we also discuss. Applications are addressed as the description of the main generalized entropies advances.


Author(s):  
Pinar Demetci ◽  
Rebecca Santorella ◽  
Björn Sandstede ◽  
William Stafford Noble ◽  
Ritambhara Singh

AbstractData integration of single-cell measurements is critical for understanding cell development and disease, but the lack of correspondence between different types of measurements makes such efforts challenging. Several unsupervised algorithms can align heterogeneous single-cell measurements in a shared space, enabling the creation of mappings between single cells in different data domains. However, these algorithms require hyperparameter tuning for high-quality alignments, which is difficult in an unsupervised setting without correspondence information for validation. We present Single-Cell alignment using Optimal Transport (SCOT), an unsupervised learning algorithm that uses Gromov Wasserstein-based optimal transport to align single-cell multi-omics datasets. We compare the alignment performance of SCOT with state-of-the-art algorithms on four simulated and two real-world datasets. SCOT performs on par with state-of-the-art methods but is faster and requires tuning fewer hyperparameters. Furthermore, we provide an algorithm for SCOT to use Gromov Wasserstein distance to guide the parameter selection. Thus, unlike previous methods, SCOT aligns well without using any orthogonal correspondence information to pick the hyperparameters. Our source code and scripts for replicating the results are available at https://github.com/rsinghlab/SCOT.


Author(s):  
M. Vidyasagar

This chapter provides an introduction to some elementary aspects of information theory, including entropy in its various forms. Entropy refers to the level of uncertainty associated with a random variable (or more precisely, the probability distribution of the random variable). When there are two or more random variables, it is worthwhile to study the conditional entropy of one random variable with respect to another. The last concept is relative entropy, also known as the Kullback–Leibler divergence, which measures the “disparity” between two probability distributions. The chapter first considers convex and concave functions before discussing the properties of the entropy function, conditional entropy, uniqueness of the entropy function, and the Kullback–Leibler divergence.


Sign in / Sign up

Export Citation Format

Share Document