A measure of mutual divergence among a number of probability distributions

The principle of optimality of dynamic programming is used to prove three major inequalities due to Shannon, Renyi and Holder. The inequalities are then used to obtain some useful results in information theory. In particular measures are obtained to measure the mutual divergence among two or more probability distributions.

Download Full-text

Introduction to Information Theory

Hidden Markov Processes ◽

10.23943/princeton/9780691133157.003.0002 ◽

2014 ◽

Author(s):

M. Vidyasagar

Keyword(s):

Information Theory ◽

Probability Distribution ◽

Relative Entropy ◽

Probability Distributions ◽

Random Variables ◽

Conditional Entropy ◽

Random Variable ◽

Entropy Function ◽

Concave Functions ◽

Leibler Divergence

This chapter provides an introduction to some elementary aspects of information theory, including entropy in its various forms. Entropy refers to the level of uncertainty associated with a random variable (or more precisely, the probability distribution of the random variable). When there are two or more random variables, it is worthwhile to study the conditional entropy of one random variable with respect to another. The last concept is relative entropy, also known as the Kullback–Leibler divergence, which measures the “disparity” between two probability distributions. The chapter first considers convex and concave functions before discussing the properties of the entropy function, conditional entropy, uniqueness of the entropy function, and the Kullback–Leibler divergence.

Download Full-text

The Role of the Information Channel in Visual Computing

Advances in Info-Metrics ◽

10.1093/oso/9780190636685.003.0017 ◽

2020 ◽

pp. 464-490

Author(s):

Miquel Feixas ◽

Mateu Sbert

Keyword(s):

Information Theory ◽

Communication Channel ◽

Probability Distributions ◽

Global Illumination ◽

Three Dimensional ◽

Information Channel ◽

Visual Computing ◽

The Social ◽

Shared Information

Around seventy years ago, Claude Shannon, who was working at Bell Laboratories, introduced information theory with the main purpose of dealing with the communication channel between source and receiver. The communication channel, or information channel as it later became known, establishes the shared information between the source or input and the receiver or output, both of which are represented by random variables, that is, by probability distributions over their possible states. The generality and flexibility of the information channel concept can be robustly applied to numerous, different areas of science and technology, even the social sciences. In this chapter, we will present examples of its application to select the best viewpoints of an object, to segment an image, and to compute the global illumination of a three-dimensional virtual scene. We hope that our examples will illustrate how the practitioners of different disciplines can use it for the purpose of organizing and understanding the interplay of information between the corresponding source and receiver.

Download Full-text

Optimal foraging and the information theory of gambling

Journal of The Royal Society Interface ◽

10.1098/rsif.2019.0162 ◽

2019 ◽

Vol 16 (157) ◽

pp. 20190162 ◽

Cited By ~ 4

Author(s):

Roland J. Baddeley ◽

Nigel R. Franks ◽

Edmund R. Hunt

Keyword(s):

Information Theory ◽

Monte Carlo ◽

Optimal Foraging ◽

Foraging Behaviour ◽

Stochastic Dynamics ◽

Probability Distributions ◽

Movement Ecology ◽

Mcmc Methods ◽

Long Run ◽

Resource Gradient

At a macroscopic level, part of the ant colony life cycle is simple: a colony collects resources; these resources are converted into more ants, and these ants in turn collect more resources. Because more ants collect more resources, this is a multiplicative process, and the expected logarithm of the amount of resources determines how successful the colony will be in the long run. Over 60 years ago, Kelly showed, using information theoretic techniques, that the rate of growth of resources for such a situation is optimized by a strategy of betting in proportion to the probability of pay-off. Thus, in the case of ants, the fraction of the colony foraging at a given location should be proportional to the probability that resources will be found there, a result widely applied in the mathematics of gambling. This theoretical optimum leads to predictions as to which collective ant movement strategies might have evolved. Here, we show how colony-level optimal foraging behaviour can be achieved by mapping movement to Markov chain Monte Carlo (MCMC) methods, specifically Hamiltonian Monte Carlo (HMC). This can be done by the ants following a (noisy) local measurement of the (logarithm of) resource probability gradient (possibly supplemented with momentum, i.e. a propensity to move in the same direction). This maps the problem of foraging (via the information theory of gambling, stochastic dynamics and techniques employed within Bayesian statistics to efficiently sample from probability distributions) to simple models of ant foraging behaviour. This identification has broad applicability, facilitates the application of information theory approaches to understand movement ecology and unifies insights from existing biomechanical, cognitive, random and optimality movement paradigms. At the cost of requiring ants to obtain (noisy) resource gradient information, we show that this model is both efficient and matches a number of characteristics of real ant exploration.

Download Full-text

Causality and information transfer in systems with extreme events

10.5194/egusphere-egu2020-12772 ◽

2020 ◽

Author(s):

Milan Palus

Keyword(s):

Information Theory ◽

Extreme Events ◽

Information Transfer ◽

Complex Dynamics ◽

Probability Distributions ◽

Mathematical Formulation ◽

Multiple Time Scales ◽

Multiple Time ◽

Theoretic Approach ◽

Heavy Tailed

The mathematical formulation of causality in measurable terms of predictability was given by the father of cybernetics N. Wiener [1] and formulated for time series by C.W.J. Granger [2]. The Granger causality is based on the evaluation of predictability in bivariate autoregressive models. This concept has been generalized for nonlinear systems using methods rooted in information theory [3,4]. The information-theoretic approach, defining causality as information transfer, has been successful in many applications and generalized to multivariate data and causal networks [e.g., 5]. This approach, rooted in the information theory of Shannon, usually ignores two important properties of complex systems, such as the Earth climate: the systems evolve on multiple time scales and their variables have heavy-tailed probability distributions. While the multiscale character of complex dynamics, such as air temperature variability, can be studied within the Shannonian framework [6, 7], the entropy concepts of R&#233;nyi and Tsallis have been proposed to cope with variables with heavy-tailed probability distributions. We will discuss how such non-Shannonian entropy concepts can be applied in inference of causality in systems with heavy-tailed probability distributions and extreme events, using examples from the climate system.This study was supported by the Czech Science Foundation, project GA19-16066S.&#160;&#160;[1] N. Wiener, in: E. F. Beckenbach (Editor), Modern Mathematics for Engineers (McGraw-Hill, New York, 1956)[2] C.W.J. Granger, Econometrica 37 (1969) 424[3] K. Hlav&#225;&#269;kov&#225;-Schindler et al., Phys. Rep. 441 (2007) &#160;1[4] M. Palu&#353;, M. Vejmelka, Phys. Rev. E 75 (2007) 056211[5] J. Runge et al., Nature Communications 6 (2015) 8502[6] M. Palu&#353;, Phys. Rev. Lett. 112 (2014) 078702&#160;[7] N. Jajcay, J. Hlinka, S. Kravtsov, A. A. Tsonis, M. Palu&#353;, Geophys. Res. Lett. 43(2) (2016) 902&#8211;909

Download Full-text

On the principle of optimality for nonstationary deterministic dynamic programming

International Journal of Economic Theory ◽

10.1111/j.1742-7363.2008.00092.x ◽

2008 ◽

Vol 4 (4) ◽

pp. 519-525 ◽

Cited By ~ 11

Author(s):

Takashi Kamihigashi

Keyword(s):

Dynamic Programming ◽

Principle Of Optimality ◽

Deterministic Dynamic

Download Full-text

Some Recent Applications of Functional Equations and Inequalities to Characterizations of Probability Distributions, Combinatorics, Information Theory and Mathematical Economics

A Modern Course on Statistical Distributions in Scientific Work ◽

10.1007/978-94-010-1848-7_30 ◽

1975 ◽

pp. 321-337 ◽

Cited By ~ 1

Author(s):

J. Aczél

Keyword(s):

Information Theory ◽

Functional Equations ◽

Probability Distributions ◽

Mathematical Economics ◽

Characterizations Of Probability Distributions

Download Full-text

Information Theory: A Swiss Army Knife for system characterization, learning and prediction

10.5194/egusphere-egu21-3034 ◽

2021 ◽

Author(s):

Uwe Ehret

Keyword(s):

Information Theory ◽

Information Content ◽

Earth Science ◽

Probability Distributions ◽

Limited Data ◽

Information Measures ◽

Dynamical Complexity ◽

System Characterization ◽

Combine Information ◽

Well Models

In this contribution, I will &#8211; with examples from hydrology - make the case for information theory as a general language and framework for i) characterizing systems, ii) quantifying the information content in data, iii) evaluating how well models can learn from data, and iv) measuring how well models do in prediction. In particular, I will discuss how information measures can be used to characterize systems by the state space volume they occupy, their dynamical complexity, and their distance from equilibrium. Likewise, I will discuss how we can measure the information content of data through systematic perturbations, and how much information a model absorbs (or ignores) from data during learning. This can help building hybrid models that optimally combine information in data and general knowledge from physical and other laws, which is currently among the key challenges in machine learning applied to earth science problems.While I will try my best to convince everybody of taking an information perspective henceforth, I will also name the related challenges: Data demands, binning choices, estimation of probability distributions from limited data, and issues with excessive data dimensionality.

Download Full-text

Dynamic Programming: Special Cases

Pattern Matching Algorithms ◽

10.1093/oso/9780195113679.003.0010 ◽

1997 ◽

Author(s):

R. Giancarlo

Keyword(s):

Dynamic Programming ◽

Optimal Policy ◽

Decision Process ◽

Optimization Problem ◽

Optimal Solution ◽

General Technique ◽

Optimal Cost ◽

Initial Node ◽

Principle Of Optimality ◽

Algorithmic Techniques

In this Chapter we present some general algorithmic techniques that have proved to be useful in speeding up the computation of some families of dynamic programming recurrences which have applications in sequence alignment, paragraph formation and prediction of RNA secondary structure. The material presented in this chapter is related to the computation of Levenshtein distances and approximate string matching that have been discussed in the previous three chapters. Dynamic programming is a general technique for solving discrete optimization (minimization or maximization) problems that can be represented by decision processes and for which the principle of optimality holds. We can view a decision process as a directed graph in which nodes represent the states of the process and edges represent decisions. The optimization problem at hand is represented as a decision process by decomposing it into a set of subproblems of smaller size. Such recursive decomposition is continued until we get only trivial subproblems, which can be solved directly. Each node in the graph corresponds to a subproblem and each edge (a, b) indicates that one way to solve subproblem a optimally is to solve first subproblem b optimally. Then, an optimal solution, or policy, is typically given by a path on the graph that minimizes or maximizes some objective function. The correctness of this approach is guaranteed by the principle of optimality which must be satisfied by the optimization problem: An optimal policy has the property that whatever the initial node (state) and initial edge (decision) are, the remaining edges (decisions) must be an optimal policy with regard to the node (state) resulting from the first transition. Another consequence of the principle of optimality is that we can express the optimal cost (and solution) of a subproblem in terms of optimal costs (and solutions) of problems of smaller size. That is, we can express optimal costs through a recurrence relation. This is a key component of dynamic programming, since we can compute the optimal cost of a subproblem only once, store the result in a table, and look it up when needed.

Download Full-text

On some applications of dynamic programming to information theory

Proceedings of the Indian Academy of Sciences - Section A ◽

10.1007/bf03049928 ◽

1968 ◽

Vol 67 (1) ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

J. N. Kapur

Keyword(s):

Information Theory ◽

Dynamic Programming

Download Full-text

New bounds for Shannon, Relative and Mandelbrot entropies via Hermite interpolating polynomial

Demonstratio Mathematica ◽

10.1515/dema-2018-0011 ◽

2018 ◽

Vol 51 (1) ◽

pp. 112-130

Author(s):

Nasir Mehmood ◽

Saad Ihsan Butt ◽

Ðilda Pečarić ◽

Josip Pečarić

Keyword(s):

Applied Sciences ◽

Information Theory ◽

Convex Function ◽

Error Term ◽

Probability Distributions ◽

Jensen’S Inequality ◽

Higher Order ◽

Green Functions ◽

Jensen's Inequality ◽

Hermite Interpolating Polynomial

AbstractTo procure inequalities for divergences between probability distributions, Jensen’s inequality is the key to success. Shannon, Relative and Zipf-Mandelbrot entropies have many applications in many applied sciences, such as, in information theory, biology and economics, etc. We consider discrete and continuous cyclic refinements of Jensen’s inequality and extend them from convex function to higher order convex function by means of different new Green functions by employing Hermite interpolating polynomial whose error term is approximated by Peano’s kernal. As an application of our obtained results, we give new bounds for Shannon, Relative and Zipf-Mandelbrot entropies.

Download Full-text