scholarly journals Computation of Kullback–Leibler Divergence in Bayesian Networks

Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1122
Author(s):  
Serafín Moral ◽  
Andrés Cano ◽  
Manuel Gómez-Olmedo

Kullback–Leibler divergence KL(p,q) is the standard measure of error when we have a true probability distribution p which is approximate with probability distribution q. Its efficient computation is essential in many tasks, as in approximate computation or as a measure of error when learning a probability. In high dimensional probabilities, as the ones associated with Bayesian networks, a direct computation can be unfeasible. This paper considers the case of efficiently computing the Kullback–Leibler divergence of two probability distributions, each one of them coming from a different Bayesian network, which might have different structures. The paper is based on an auxiliary deletion algorithm to compute the necessary marginal distributions, but using a cache of operations with potentials in order to reuse past computations whenever they are necessary. The algorithms are tested with Bayesian networks from the bnlearn repository. Computer code in Python is provided taking as basis pgmpy, a library for working with probabilistic graphical models.

Author(s):  
M. Vidyasagar

This chapter provides an introduction to some elementary aspects of information theory, including entropy in its various forms. Entropy refers to the level of uncertainty associated with a random variable (or more precisely, the probability distribution of the random variable). When there are two or more random variables, it is worthwhile to study the conditional entropy of one random variable with respect to another. The last concept is relative entropy, also known as the Kullback–Leibler divergence, which measures the “disparity” between two probability distributions. The chapter first considers convex and concave functions before discussing the properties of the entropy function, conditional entropy, uniqueness of the entropy function, and the Kullback–Leibler divergence.


2011 ◽  
Vol 2011 ◽  
pp. 1-13 ◽  
Author(s):  
Linda Smail

Bayesian Networks are graphic probabilistic models through which we can acquire, capitalize on, and exploit knowledge. they are becoming an important tool for research and applications in artificial intelligence and many other fields in the last decade. This paper presents Bayesian networks and discusses the inference problem in such models. It proposes a statement of the problem and the proposed method to compute probability distributions. It also uses D-separation for simplifying the computation of probabilities in Bayesian networks. Given a Bayesian network over a family of random variables, this paper presents a result on the computation of the probability distribution of a subset of using separately a computation algorithm and D-separation properties. It also shows the uniqueness of the obtained result.


Author(s):  
Marco F. Ramoni ◽  
Paola Sebastiani

Born at the intersection of artificial intelligence, statistics, and probability, Bayesian networks (Pearl, 1988) are a representation formalism at the cutting edge of knowledge discovery and data mining (Heckerman, 1997). Bayesian networks belong to a more general class of models called probabilistic graphical models (Whittaker, 1990; Lauritzen, 1996) that arise from the combination of graph theory and probability theory, and their success rests on their ability to handle complex probabilistic models by decomposing them into smaller, amenable components. A probabilistic graphical model is defined by a graph, where nodes represent stochastic variables and arcs represent dependencies among such variables. These arcs are annotated by probability distribution shaping the interaction between the linked variables. A probabilistic graphical model is called a Bayesian network, when the graph connecting its variables is a directed acyclic graph (DAG). This graph represents conditional independence assumptions that are used to factorize the joint probability distribution of the network variables, thus making the process of learning from a large database amenable to computations. A Bayesian network induced from data can be used to investigate distant relationships between variables, as well as making prediction and explanation, by computing the conditional probability distribution of one variable, given the values of some others.


Author(s):  
Baisravan HomChaudhuri

Abstract This paper focuses on distributionally robust controller design for avoiding dynamic and stochastic obstacles whose exact probability distribution is unknown. The true probability distribution of the disturbance associated with an obstacle, although unknown, is considered to belong to an ambiguity set that includes all the probability distributions that share the same first two moment. The controller thus focuses on ensuring the satisfaction of the probabilistic collision avoidance constraints for all probability distributions in the ambiguity set, hence making the solution robust to the true probability distribution of the stochastic obstacles. Techniques from robust optimization methods are used to model the distributionally robust probabilistic or chance constraints as a semi-definite programming (SDP) problem with linear matrix inequality (LMI) constraints that can be solved in a computationally tractable fashion. Simulation results for a robot obstacle avoidance problem shows the efficacy of our method.


Author(s):  
Tahrima Rahman ◽  
Shasha Jin ◽  
Vibhav Gogate

Recently there has been growing interest in learning probabilistic models that admit poly-time inference called tractable probabilistic models from data. Although they generalize poorly as compared to intractable models, they often yield more accurate estimates at prediction time. In this paper, we seek to further explore this trade-off between generalization performance and inference accuracy by proposing a novel, partially tractable representation called cutset Bayesian networks (CBNs). The main idea in CBNs is to partition the variables into two subsets X and Y, learn a (intractable) Bayesian network that represents P(X) and a tractable conditional model that represents P(Y|X). The hope is that the intractable model will help improve generalization while the tractable model, by leveraging Rao-Blackwellised sampling which combines exact inference and sampling, will help improve the prediction accuracy. To compactly model P(Y|X), we introduce a novel tractable representation called conditional cutset networks (CCNs) in which all conditional probability distributions are represented using calibrated classifiers—classifiers which typically yield higher quality probability estimates than conventional classifiers. We show via a rigorous experimental evaluation that CBNs and CCNs yield more accurate posterior estimates than their tractable as well as intractable counterparts.


Author(s):  
Marco F. Ramoni ◽  
Paola Sebastiani

Born at the intersection of artificial intelligence, statistics, and probability, Bayesian networks (Pearl, 1988) are a representation formalism at the cutting edge of knowledge discovery and data mining (Heckerman, 1997). Bayesian networks belong to a more general class of models called probabilistic graphical models (Whittaker, 1990; Lauritzen, 1996) that arise from the combination of graph theory and probability theory, and their success rests on their ability to handle complex probabilistic models by decomposing them into smaller, amenable components. A probabilistic graphical model is defined by a graph, where nodes represent stochastic variables and arcs represent dependencies among such variables. These arcs are annotated by probability distribution shaping the interaction between the linked variables. A probabilistic graphical model is called a Bayesian network, when the graph connecting its variables is a directed acyclic graph (DAG). This graph represents conditional independence assumptions that are used to factorize the joint probability distribution of the network variables, thus making the process of learning from a large database amenable to computations. A Bayesian network induced from data can be used to investigate distant relationships between variables, as well as making prediction and explanation, by computing the conditional probability distribution of one variable, given the values of some others.


Author(s):  
Marco F. Ramoni ◽  
Paola Sebastiani

Born at the intersection of artificial intelligence, statistics, and probability, Bayesian networks (Pearl, 1988) are a representation formalism at the cutting edge of knowledge discovery and data mining (Heckerman, 1997). Bayesian networks belong to a more general class of models called probabilistic graphical models (Whittaker, 1990; Lauritzen, 1996) that arise from the combination of graph theory and probability theory, and their success rests on their ability to handle complex probabilistic models by decomposing them into smaller, amenable components. A probabilistic graphical model is defined by a graph, where nodes represent stochastic variables and arcs represent dependencies among such variables. These arcs are annotated by probability distribution shaping the interaction between the linked variables. A probabilistic graphical model is called a Bayesian network, when the graph connecting its variables is a directed acyclic graph (DAG). This graph represents conditional independence assumptions that are used to factorize the joint probability distribution of the network variables, thus making the process of learning from a large database amenable to computations. A Bayesian network induced from data can be used to investigate distant relationships between variables, as well as making prediction and explanation, by computing the conditional probability distribution of one variable, given the values of some others.


2011 ◽  
Vol 09 (supp01) ◽  
pp. 39-47
Author(s):  
ALESSIA ALLEVI ◽  
MARIA BONDANI ◽  
ALESSANDRA ANDREONI

We present the experimental reconstruction of the Wigner function of some optical states. The method is based on direct intensity measurements by non-ideal photodetectors operated in the linear regime. The signal state is mixed at a beam-splitter with a set of coherent probes of known complex amplitudes and the probability distribution of the detected photons is measured. The Wigner function is given by a suitable sum of these probability distributions measured for different values of the probe. For comparison, the same data are analyzed to obtain the number distributions and the Wigner functions for photons.


2021 ◽  
Vol 5 (1) ◽  
pp. 1-11
Author(s):  
Vitthal Anwat ◽  
Pramodkumar Hire ◽  
Uttam Pawar ◽  
Rajendra Gunjal

Flood Frequency Analysis (FFA) method was introduced by Fuller in 1914 to understand the magnitude and frequency of floods. The present study is carried out using the two most widely accepted probability distributions for FFA in the world namely, Gumbel Extreme Value type I (GEVI) and Log Pearson type III (LP-III). The Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) methods were used to select the most suitable probability distribution at sites in the Damanganga Basin. Moreover, discharges were estimated for various return periods using GEVI and LP-III. The recurrence interval of the largest peak flood on record (Qmax) is 107 years (at Nanipalsan) and 146 years (at Ozarkhed) as per LP-III. Flood Frequency Curves (FFC) specifies that LP-III is the best-fitted probability distribution for FFA of the Damanganga Basin. Therefore, estimated discharges and return periods by LP-III probability distribution are more reliable and can be used for designing hydraulic structures.


Author(s):  
J. L. Cagney ◽  
S. S. Rao

Abstract The modeling of manufacturing errors in mechanisms is a significant task to validate practical designs. The use of probability distributions for errors can simulate manufacturing variations and real world operations. This paper presents the mechanical error analysis of universal joint drivelines. Each error is simulated using a probability distribution, i.e., a design of the mechanism is created by assigning random values to the errors. Each design is then evaluated by comparing the output error with a limiting value and the reliability of the universal joint is estimated. For this, the design is considered a failure whenever the output error exceeds the specified limit. In addition, the problem of synthesis, which involves the allocation of tolerances (errors) for minimum manufacturing cost without violating a specified accuracy requirement of the output, is also considered. Three probability distributions — normal, Weibull and beta distributions — were used to simulate the random values of the errors. The similarity of the results given by the three distributions suggests that the use of normal distribution would be acceptable for modeling the tolerances in most cases.


Sign in / Sign up

Export Citation Format

Share Document