scholarly journals A measure of mutual divergence among a number of probability distributions

1987 ◽  
Vol 10 (3) ◽  
pp. 597-607 ◽  
Author(s):  
J. N. Kapur ◽  
Vinod Kumar ◽  
Uma Kumar

The principle of optimality of dynamic programming is used to prove three major inequalities due to Shannon, Renyi and Holder. The inequalities are then used to obtain some useful results in information theory. In particular measures are obtained to measure the mutual divergence among two or more probability distributions.

Author(s):  
M. Vidyasagar

This chapter provides an introduction to some elementary aspects of information theory, including entropy in its various forms. Entropy refers to the level of uncertainty associated with a random variable (or more precisely, the probability distribution of the random variable). When there are two or more random variables, it is worthwhile to study the conditional entropy of one random variable with respect to another. The last concept is relative entropy, also known as the Kullback–Leibler divergence, which measures the “disparity” between two probability distributions. The chapter first considers convex and concave functions before discussing the properties of the entropy function, conditional entropy, uniqueness of the entropy function, and the Kullback–Leibler divergence.


2020 ◽  
pp. 464-490
Author(s):  
Miquel Feixas ◽  
Mateu Sbert

Around seventy years ago, Claude Shannon, who was working at Bell Laboratories, introduced information theory with the main purpose of dealing with the communication channel between source and receiver. The communication channel, or information channel as it later became known, establishes the shared information between the source or input and the receiver or output, both of which are represented by random variables, that is, by probability distributions over their possible states. The generality and flexibility of the information channel concept can be robustly applied to numerous, different areas of science and technology, even the social sciences. In this chapter, we will present examples of its application to select the best viewpoints of an object, to segment an image, and to compute the global illumination of a three-dimensional virtual scene. We hope that our examples will illustrate how the practitioners of different disciplines can use it for the purpose of organizing and understanding the interplay of information between the corresponding source and receiver.


2019 ◽  
Vol 16 (157) ◽  
pp. 20190162 ◽  
Author(s):  
Roland J. Baddeley ◽  
Nigel R. Franks ◽  
Edmund R. Hunt

At a macroscopic level, part of the ant colony life cycle is simple: a colony collects resources; these resources are converted into more ants, and these ants in turn collect more resources. Because more ants collect more resources, this is a multiplicative process, and the expected logarithm of the amount of resources determines how successful the colony will be in the long run. Over 60 years ago, Kelly showed, using information theoretic techniques, that the rate of growth of resources for such a situation is optimized by a strategy of betting in proportion to the probability of pay-off. Thus, in the case of ants, the fraction of the colony foraging at a given location should be proportional to the probability that resources will be found there, a result widely applied in the mathematics of gambling. This theoretical optimum leads to predictions as to which collective ant movement strategies might have evolved. Here, we show how colony-level optimal foraging behaviour can be achieved by mapping movement to Markov chain Monte Carlo (MCMC) methods, specifically Hamiltonian Monte Carlo (HMC). This can be done by the ants following a (noisy) local measurement of the (logarithm of) resource probability gradient (possibly supplemented with momentum, i.e. a propensity to move in the same direction). This maps the problem of foraging (via the information theory of gambling, stochastic dynamics and techniques employed within Bayesian statistics to efficiently sample from probability distributions) to simple models of ant foraging behaviour. This identification has broad applicability, facilitates the application of information theory approaches to understand movement ecology and unifies insights from existing biomechanical, cognitive, random and optimality movement paradigms. At the cost of requiring ants to obtain (noisy) resource gradient information, we show that this model is both efficient and matches a number of characteristics of real ant exploration.


2020 ◽  
Author(s):  
Milan Palus

<p>The mathematical formulation of causality in measurable terms of predictability was given by the father of cybernetics N. Wiener [1] and formulated for time series by C.W.J. Granger [2]. The Granger causality is based on the evaluation of predictability in bivariate autoregressive models. This concept has been generalized for nonlinear systems using methods rooted in information theory [3,4]. The information-theoretic approach, defining causality as information transfer, has been successful in many applications and generalized to multivariate data and causal networks [e.g., 5]. This approach, rooted in the information theory of Shannon, usually ignores two important properties of complex systems, such as the Earth climate: the systems evolve on multiple time scales and their variables have heavy-tailed probability distributions. While the multiscale character of complex dynamics, such as air temperature variability, can be studied within the Shannonian framework [6, 7], the entropy concepts of Rényi and Tsallis have been proposed to cope with variables with heavy-tailed probability distributions. We will discuss how such non-Shannonian entropy concepts can be applied in inference of causality in systems with heavy-tailed probability distributions and extreme events, using examples from the climate system.</p><p>This study was supported by the Czech Science Foundation, project GA19-16066S.</p><p> </p><p> [1] N. Wiener, in: E. F. Beckenbach (Editor), Modern Mathematics for Engineers (McGraw-Hill, New York, 1956)</p><p>[2] C.W.J. Granger, Econometrica 37 (1969) 424</p><p>[3] K. Hlaváčková-Schindler et al., Phys. Rep. 441 (2007)  1</p><p>[4] M. Paluš, M. Vejmelka, Phys. Rev. E 75 (2007) 056211</p><p>[5] J. Runge et al., Nature Communications 6 (2015) 8502</p><p>[6] M. Paluš, Phys. Rev. Lett. 112 (2014) 078702</p><p> [7] N. Jajcay, J. Hlinka, S. Kravtsov, A. A. Tsonis, M. Paluš, Geophys. Res. Lett. 43(2) (2016) 902–909</p>


2021 ◽  
Author(s):  
Uwe Ehret

<p>In this contribution, I will – with examples from hydrology - make the case for information theory as a general language and framework for i) characterizing systems, ii) quantifying the information content in data, iii) evaluating how well models can learn from data, and iv) measuring how well models do in prediction. In particular, I will discuss how information measures can be used to characterize systems by the state space volume they occupy, their dynamical complexity, and their distance from equilibrium. Likewise, I will discuss how we can measure the information content of data through systematic perturbations, and how much information a model absorbs (or ignores) from data during learning. This can help building hybrid models that optimally combine information in data and general knowledge from physical and other laws, which is currently among the key challenges in machine learning applied to earth science problems.</p><p>While I will try my best to convince everybody of taking an information perspective henceforth, I will also name the related challenges: Data demands, binning choices, estimation of probability distributions from limited data, and issues with excessive data dimensionality.</p>


Author(s):  
R. Giancarlo

In this Chapter we present some general algorithmic techniques that have proved to be useful in speeding up the computation of some families of dynamic programming recurrences which have applications in sequence alignment, paragraph formation and prediction of RNA secondary structure. The material presented in this chapter is related to the computation of Levenshtein distances and approximate string matching that have been discussed in the previous three chapters. Dynamic programming is a general technique for solving discrete optimization (minimization or maximization) problems that can be represented by decision processes and for which the principle of optimality holds. We can view a decision process as a directed graph in which nodes represent the states of the process and edges represent decisions. The optimization problem at hand is represented as a decision process by decomposing it into a set of subproblems of smaller size. Such recursive decomposition is continued until we get only trivial subproblems, which can be solved directly. Each node in the graph corresponds to a subproblem and each edge (a, b) indicates that one way to solve subproblem a optimally is to solve first subproblem b optimally. Then, an optimal solution, or policy, is typically given by a path on the graph that minimizes or maximizes some objective function. The correctness of this approach is guaranteed by the principle of optimality which must be satisfied by the optimization problem: An optimal policy has the property that whatever the initial node (state) and initial edge (decision) are, the remaining edges (decisions) must be an optimal policy with regard to the node (state) resulting from the first transition. Another consequence of the principle of optimality is that we can express the optimal cost (and solution) of a subproblem in terms of optimal costs (and solutions) of problems of smaller size. That is, we can express optimal costs through a recurrence relation. This is a key component of dynamic programming, since we can compute the optimal cost of a subproblem only once, store the result in a table, and look it up when needed.


2018 ◽  
Vol 51 (1) ◽  
pp. 112-130
Author(s):  
Nasir Mehmood ◽  
Saad Ihsan Butt ◽  
Ðilda Pečarić ◽  
Josip Pečarić

AbstractTo procure inequalities for divergences between probability distributions, Jensen’s inequality is the key to success. Shannon, Relative and Zipf-Mandelbrot entropies have many applications in many applied sciences, such as, in information theory, biology and economics, etc. We consider discrete and continuous cyclic refinements of Jensen’s inequality and extend them from convex function to higher order convex function by means of different new Green functions by employing Hermite interpolating polynomial whose error term is approximated by Peano’s kernal. As an application of our obtained results, we give new bounds for Shannon, Relative and Zipf-Mandelbrot entropies.


Sign in / Sign up

Export Citation Format

Share Document