scholarly journals Bayesian Hypothesis Testing for Gaussian Graphical Models:Conditional Independence and Order Constraints

Author(s):  
Donald Ray Williams ◽  
Joris Mulder

Gaussian graphical models (GGM; partial correlation networks) have become increasingly popular in the social and behavioral sciences for studying conditional (in)dependencies between variables. In this work, we introduce exploratory and confirmatory Bayesian tests for partial correlations. For the former, we first extend the customary GGM formulation that focuses on conditional dependence to also consider the null hypothesis of conditional independence for each partial correlation. Here a novel testing strategy is introduced that can provide evidence for a null, negative, or positive effect. We then introduce a test for hypotheses with order constraints on partial correlations. This allows for testing theoretical and clinical expectations in GGMs. The novel matrix$-F$ prior distribution is described that provides increased flexibility in specification compared to the Wishart prior. The methods are applied to PTSD symptoms. In several applications, we demonstrate how the exploratory and confirmatory approaches can work in tandem: hypotheses are formulated from an initial analysis and then tested in an independent dataset. The methodology is implemented in the R package BGGM.

2020 ◽  
Author(s):  
Victor Bernal ◽  
Rainer Bischoff ◽  
Peter Horvatovich ◽  
Victor Guryev ◽  
Marco Grzegorczyk

Abstract Background: In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes (‘high dimensional problem’). Shrinkage methods address this issue by learning a regularized GGM. However, it is an open question how the shrinkage affects the final result and its interpretation.Results: We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as the ‘un-shrunk’ partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. We apply the ‘un-shrunk’ method to two gene expression datasets from Escherichia coli and Mus musculus.Conclusions: GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the “high-dimensional” problem. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Victor Bernal ◽  
Rainer Bischoff ◽  
Peter Horvatovich ◽  
Victor Guryev ◽  
Marco Grzegorczyk

Abstract Background In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes (‘high dimensional problem’). Shrinkage methods address this issue by learning a regularized GGM. However, it remains open to study how the shrinkage affects the final result and its interpretation. Results We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as ‘un-shrinking’ the partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. This is demonstrated on two gene expression datasets from Escherichia coli and Mus musculus. Conclusions GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the ‘high-dimensional problem’. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Ginette Lafit ◽  
Francis Tuerlinckx ◽  
Inez Myin-Germeys ◽  
Eva Ceulemans

AbstractGaussian Graphical Models (GGMs) are extensively used in many research areas, such as genomics, proteomics, neuroimaging, and psychology, to study the partial correlation structure of a set of variables. This structure is visualized by drawing an undirected network, in which the variables constitute the nodes and the partial correlations the edges. In many applications, it makes sense to impose sparsity (i.e., some of the partial correlations are forced to zero) as sparsity is theoretically meaningful and/or because it improves the predictive accuracy of the fitted model. However, as we will show by means of extensive simulations, state-of-the-art estimation approaches for imposing sparsity on GGMs, such as the Graphical lasso, ℓ1 regularized nodewise regression, and joint sparse regression, fall short because they often yield too many false positives (i.e., partial correlations that are not properly set to zero). In this paper we present a new estimation approach that allows to control the false positive rate better. Our approach consists of two steps: First, we estimate an undirected network using one of the three state-of-the-art estimation approaches. Second, we try to detect the false positives, by flagging the partial correlations that are smaller in absolute value than a given threshold, which is determined through cross-validation; the flagged correlations are set to zero. Applying this new approach to the same simulated data, shows that it indeed performs better. We also illustrate our approach by using it to estimate (1) a gene regulatory network for breast cancer data, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with PTSD.


2021 ◽  
Author(s):  
Donald Ray Williams

Studying complex relations in multivariate datasets is a common task across the sciences. Cognitive neuroscientists model brain connectivity with the goal of unearthing functional and structural associations betweencortical regions. In clinical psychology, researchers wish to better understand the intri-cate web of symptom interrelations that underlie mental health disorders. To this end, graphical modeling has emerged as an oft-used tool in the chest of scientific inquiry. Thebasic idea is to characterize multivariate relations by learning the conditional dependence structure. The cortical regions or symptoms are nodes and the featured connections linking nodes are edges that graphically represent the conditional dependence structure. Graphical modeling is quite common in fields with wide data, that is, when there are more variables (p) thanobservations (n). Accordingly, many regularization-based approaches have been developed for those kinds of data. More recently, graphical modeling has emerged in psychology, where the data is typically long or low-dimensional. The primary purpose of GGMnonreg is to provide methods that were specifically designed for low-dimensional data (e.g., those common in the social-behavioral sciences), for which there is a dearth of methodology.


2020 ◽  
Author(s):  
Donald Ray Williams

The topic of replicability has recently captivated the emerging field of networkpsychometrics. Although methodological practice (e.g., p-hacking) has been identified as aroot cause of unreliable research findings in psychological science, the statistical modelitself has come under attack in the partial correlation network literature. In a motivatingexample, I first describe how sampling variability inherent to partial correlations can merelygive the appearance of unreliability. For example, when going from zero-order to partialcorrelations there is necessarily more sampling variability that translates into reducedstatistical power. I then introduce novel methodology for deriving expected networkreplicability (ENR), wherein replication is modeled with the Poisson-binomial distribution.This analytic solution can be used with the Pearson, Spearman, Kendall, and polychoricpartial correlation coefficient. I first employed the method to estimate ENR for a variety ofdatasets from the network literature. Here it was determined that partial correlationnetworks do not have inherent limitations, given current estimates of replicability wereconsistent with ENR. I then highlighted sources that can reduce replicability, that is, whengoing from continuous to ordinal data with few categories and employing a multiplecomparisons correction. To address these challenges, I described a strategy for using theproposed method to plan for network replication. I end with recommendations that includethe importance of the network literature repositioning itself with gold-standard approachesfor assessing replication, including explicit consideration of type I and type II error rates.The method for computing ENR is implemented in the R package GGMnonreg.


2019 ◽  
Author(s):  
Donald Ray Williams ◽  
Joris Mulder

Gaussian graphical models (GGM) allow for learning conditional independence structures that are encoded by partial correlations. Whereas there are several \proglang{R} packages for classical (i.e., frequentist) methods, there are only two that implement a Bayesian approach. These are exclusively focused on identifying the graphical structure; that is, detecting non-zero effects. The \proglang{R} package \pkg{BGGM} not only fills this gap, but it also includes novel Bayesian methodology for extending inference beyond identifying non-zero relations. \pkg{BGGM} is built around two Bayesian approaches for inference--estimation and hypothesis testing. The former focuses on the posterior distribution and includes extensions to assess predictability, as well as methodology to compare partial correlations. The latter includes methods for Bayesian hypothesis testing, in both exploratory and confirmatory contexts, with the novel matrix-$F$ prior distribution. This allows for testing order and equality constrained hypotheses, as well as a combination of both with the Bayes factor. Further, there are two approaches for comparing any number of GGMs with either the posterior predictive distribution or Bayesian hypothesis testing. This work describes the software implementation of these methods. We end by discussing future directions for \pkg{BGGM}.


2018 ◽  
Author(s):  
Donald Ray Williams

Gaussian graphical models (GGM; ``networks'') allow for estimating conditional independence structures that are encoded by partial correlations. This is accomplished by identifying non-zero relations in the inverse of the covariance matrix. In psychology the default estimation method uses $\ell_1$-regularization, where the accompanying inferences are restricted to frequentist objectives. Bayesian methods remain relatively uncommon in practice and methodological literatures.To date, they have not yet been used for estimation and inference in the psychological network literature. In this work, I introduce Bayesian methodology that is specifically designed for the most common psychological applications. The graphical structure is determined with posterior probabilities, which allow for assessing conditional dependent and independent relations. Additional methods are provided for extending inference to specific aspects within- and between-networks, including partial correlation differences and Bayesian methodology to quantify network predictability. I first demonstrate that the decision rule based on posterior probabilities can be calibrated to the desired level of specificity. The proposed techniques are then demonstrated in several illustrative examples. The methods have been implemented in the R package BGGM.


2021 ◽  
Author(s):  
Joran Jongerling ◽  
Sacha Epskamp ◽  
Donald Ray Williams

Gaussian Graphical Models (GGMs) are often estimated using regularized estimation and the graphical LASSO (GLASSO). However, the GLASSO has difficulty estimating(uncertainty in) centrality indices of nodes. Regularized Bayesian estimation might provide a solution, as it is better suited to deal with bias in the sampling distribution ofcentrality indices. This study therefore compares estimation of GGMs with a Bayesian GLASSO- and a Horseshoe prior to estimation using the frequentist GLASSO in an extensive simulation study. Results showed that out of the two Bayesian estimation methods, the Bayesian GLASSO performed best. In addition, the Bayesian GLASSOperformed better than the frequentist GLASSO with respect to bias in edge weights, centrality measures, correlation between estimated and true partial correlations, andspecificity. With respect to sensitivity the frequentist GLASSO performs better.However, sensitivity of the Bayesian GLASSO is close to that of the frequentist GLASSO (except for the smallest N used in the simulations) and tends to be favored over the frequentist GLASSO in terms of F1. With respect to uncertainty in the centrality measures, the Bayesian GLASSO shows good coverage for strength andcloseness centrality. Uncertainty in betweenness centrality is estimated less well, and typically overestimated by the Bayesian GLASSO.


2021 ◽  
Vol 11 (2) ◽  
pp. 31-50
Author(s):  
S.L. Artemenkov

Network modeling, which has emerged in recent years, can be successfully applied to the consideration of relationships between measurable psychological variables. In this context, psychological variables are understood as directly affecting each other, and not as a consequence of a latent construct. The article describes regularization methods that can be used to effectively assess the sparse and interpretable network structure based on partial correlations of psychological indicators. An overview of the glasso regularization procedure using EBIC model selection for evaluating an ordered sparse network of partial correlations is presented. The issues of performing this analysis in R in the presence of normal and non-normal data distribution are considered, taking into account the influence of the hyperparameter, which is manually set by the researcher. The considered approach is also interesting as a way to visualize possible causal connections between variables. This review bridges the gap related to the lack of an accessible description in Russian of this approach, which is still uncommon in Russia and at the same time promising.


2019 ◽  
Vol 35 (17) ◽  
pp. 3184-3186
Author(s):  
Xiao-Fei Zhang ◽  
Le Ou-Yang ◽  
Shuo Yang ◽  
Xiaohua Hu ◽  
Hong Yan

Abstract Summary To identify biological network rewiring under different conditions, we develop a user-friendly R package, named DiffNetFDR, to implement two methods developed for testing the difference in different Gaussian graphical models. Compared to existing tools, our methods have the following features: (i) they are based on Gaussian graphical models which can capture the changes of conditional dependencies; (ii) they determine the tuning parameters in a data-driven manner; (iii) they take a multiple testing procedure to control the overall false discovery rate; and (iv) our approach defines the differential network based on partial correlation coefficients so that the spurious differential edges caused by the variants of conditional variances can be excluded. We also develop a Shiny application to provide easier analysis and visualization. Simulation studies are conducted to evaluate the performance of our methods. We also apply our methods to two real gene expression datasets. The effectiveness of our methods is validated by the biological significance of the identified differential networks. Availability and implementation R package and Shiny app are available at https://github.com/Zhangxf-ccnu/DiffNetFDR. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document