Efficient estimation and model selection in large graphical models

1996 ◽  
Vol 6 (4) ◽  
pp. 313-323 ◽  
Author(s):  
Dag Wedelin
2019 ◽  
Author(s):  
Donald Ray Williams ◽  
Philippe Rast ◽  
Luis Pericchi ◽  
Joris Mulder

Gaussian graphical models are commonly used to characterize conditional independence structures (i.e., networks) of psychological constructs. Recently attention has shifted from estimating single networks to those from various sub-populations. The focus is primarily to detect differences or demonstrate replicability. We introduce two novel Bayesian methods for comparing networks that explicitly address these aims. The first is based on the posterior predictive distribution, with Kullback-Leibler divergence as the discrepancy measure, that tests differences between two multivariate normal distributions. The second approach makes use of Bayesian model selection, with the Bayes factor, and allows for gaining evidence for invariant network structures. This overcomes limitations of current approaches in the literature that use classical hypothesis testing, where it is only possible to determine whether groups are significantly different from each other. With simulation we show the posterior predictive method is approximately calibrated under the null hypothesis ($\alpha = 0.05$) and has more power to detect differences than alternative approaches. We then examine the necessary sample sizes for detecting invariant network structures with Bayesian hypothesis testing, in addition to how this is influenced by the choice of prior distribution. The methods are applied to post-traumatic stress disorder symptoms that were measured in four groups. We end by summarizing our major contribution, that is proposing two novel methods for comparing GGMs, which extends beyond the social-behavioral sciences. The methods have been implemented in the R package BGGM.


2008 ◽  
Vol 38 (1) ◽  
pp. 131-143
Author(s):  
Kohtaro Hitomi ◽  
Qing-Feng Liu ◽  
Yoshihiko Nishiyama ◽  
Naoya Sueishi

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Fredrik Ronquist ◽  
Jan Kudlicka ◽  
Viktor Senderov ◽  
Johannes Borgström ◽  
Nicolas Lartillot ◽  
...  

AbstractStatistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.


Author(s):  
Fredrik Ronquist ◽  
Jan Kudlicka ◽  
Viktor Senderov ◽  
Johannes Borgström ◽  
Nicolas Lartillot ◽  
...  

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.


Author(s):  
Navid Tafaghodi Khajavi ◽  
Anthony Kuh

AbstractGraphical models are increasingly being used in many complex engineering problems to model the dynamics between states of the graph. These graphs are often very large and approximation models are needed to reduce the computational complexity. This paper considers the problem of quantifying the quality of an approximation model for a graphical model (model selection problem). The model selection often uses a distance measure such as the Kullback–Leibler (KL) divergence between the original distribution and the model distribution to quantify the quality of the model approximation. We extend and broaden the body of research by formulating the model approximation as a detection problem between the original distribution and the model distribution. We focus on Gaussian random vectors and introduce the Correlation Approximation Matrix (CAM) and use the Area Under the Curve (AUC) for the formulated detection problem. The closeness measures such as the KL divergence, the log-likelihood ratio, and the AUC are functions of the eigenvalues of the CAM. Easily computable upper and lower bounds are found for the AUC. The paper concludes by computing these measures for real and synthetic simulation data. Tree approximations and more complex graphical models are considered for approximation models.


Sign in / Sign up

Export Citation Format

Share Document