Sandpiper: Scaling probabilistic inferencing to large scale graphical models

AbstractGene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Furthermore, when the goal is to study causal relationships within the network, tools capable of overcoming the limitations of correlation networks are required. In this work, we make use of Bayesian Graphical Models to attach this problem and, specifically, we perform a comparative study of different state-of-the-art heuristics, analyzing their performance in inferring the structure of the Bayesian Network from breast cancer data.

Download Full-text

Cortical Circuitry Implementing Graphical Models

Neural Computation ◽

10.1162/neco.2009.05-08-783 ◽

2009 ◽

Vol 21 (11) ◽

pp. 3010-3056 ◽

Cited By ~ 29

Author(s):

Shai Litvak ◽

Shimon Ullman

Keyword(s):

Graphical Models ◽

Large Scale ◽

Graphical Model ◽

Current Model ◽

Building Blocks ◽

Population Based ◽

Spiking Neurons ◽

Inhibitory Neurons ◽

Basket Cells ◽

Local Circuitry

In this letter, we develop and simulate a large-scale network of spiking neurons that approximates the inference computations performed by graphical models. Unlike previous related schemes, which used sum and product operations in either the log or linear domains, the current model uses an inference scheme based on the sum and maximization operations in the log domain. Simulations show that using these operations, a large-scale circuit, which combines populations of spiking neurons as basic building blocks, is capable of finding close approximations to the full mathematical computations performed by graphical models within a few hundred milliseconds. The circuit is general in the sense that it can be wired for any graph structure, it supports multistate variables, and it uses standard leaky integrate-and-fire neuronal units. Following previous work, which proposed relations between graphical models and the large-scale cortical anatomy, we focus on the cortical microcircuitry and propose how anatomical and physiological aspects of the local circuitry may map onto elements of the graphical model implementation. We discuss in particular the roles of three major types of inhibitory neurons (small fast-spiking basket cells, large layer 2/3 basket cells, and double-bouquet neurons), subpopulations of strongly interconnected neurons with their unique connectivity patterns in different cortical layers, and the possible role of minicolumns in the realization of the population-based maximum operation.

Download Full-text

Multiscale Gaussian Graphical Models and Algorithms for Large-Scale Inference

2007 IEEE/SP 14th Workshop on Statistical Signal Processing ◽

10.1109/ssp.2007.4301253 ◽

2007 ◽

Cited By ~ 10

Author(s):

Myung Jin Choi ◽

Alan S. Willsky

Keyword(s):

Graphical Models ◽

Large Scale ◽

Gaussian Graphical Models

Download Full-text

Meta-analytic Gaussian Network Aggregation

10.31234/osf.io/236w8 ◽

2020 ◽

Author(s):

Sacha Epskamp ◽

Adela-Maria Isvoranu ◽

Mike W.-L. Cheung

Keyword(s):

Graphical Models ◽

Fixed Effects ◽

Large Scale ◽

Correlation Coefficients ◽

Likelihood Estimation ◽

Post Traumatic Stress ◽

Modeling Framework ◽

Network Aggregation ◽

Study Heterogeneity ◽

Gaussian Network

A growing number of publications focuses on estimating Gaussian graphical models (GGMs, networks of partial correlation coefficients). At the same time, generalizibility and replicability of these highly parameterized models are debated, and sample sizes typically found in datasets may not be sufficient for estimating the underlying network structure. In addition, while recent work emerged that aims to compare networks based on different samples, these studies do not take potential cross-study heterogeneity into account. To this end, this paper introduces methods for estimating GGMs through aggregating over multiple datasets. We first introduce a general maximum likelihood estimation modeling framework in which all discussed models are embedded. This modeling framework is subsequently used to introduce meta-analytic Gaussian network aggregation (MAGNA). We discuss two variants: fixed-effects MAGNA, in which heterogeneity across studies is not taken into account, and random-effects MAGNA, which models sample correlations and takes heterogeneity into account. We exemplify the method using four datasets of post-traumatic stress disorder symptoms, as well as one large dataset of depression, anxiety and stress symptoms. Finally, we assess the performance of MAGNA in large-scale simulation studies.

Download Full-text

MAP-Inference on Large Scale Higher-Order Discrete Graphical Models by Fusion Moves

Computer Vision - ECCV 2014 Workshops - Lecture Notes in Computer Science ◽

10.1007/978-3-319-16181-5_37 ◽

2015 ◽

pp. 469-484

Author(s):

Jörg Hendrik Kappes ◽

Thorsten Beier ◽

Christoph Schnörr

Keyword(s):

Graphical Models ◽

Large Scale ◽

Higher Order ◽

Map Inference

Download Full-text

Distributed Bayesian Networks Reconstruction on the Whole Genome Scale

10.1101/016683 ◽

2015 ◽

Author(s):

Alina Frolova ◽

Bartek Wilczynski

Keyword(s):

Experimental Data ◽

Bayesian Networks ◽

Graphical Models ◽

Polynomial Time ◽

Protein Interactions ◽

Regulatory Networks ◽

Large Scale ◽

External Information ◽

Whole Genome ◽

Wide Audience

AbstractBackgroundBayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein-protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly.ResultsIn the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder - tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced.ConclusionsWe show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets.

Download Full-text

Rapid inference of direct interactions in large-scale ecological networks from heterogeneous microbial sequencing data

10.1101/390195 ◽

2018 ◽

Cited By ~ 4

Author(s):

Janko Tackmann ◽

João Frederico Matias Rodrigues ◽

Christian von Mering

Keyword(s):

Graphical Models ◽

Large Scale ◽

Study Data ◽

Microbial Interactions ◽

Data Sets ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Human Gut ◽

Data Set ◽

Seamless Integration

AbstractThe recent explosion of metagenomic sequencing data opens the door towards the modeling of microbial ecosystems in unprecedented detail. In particular, co-occurrence based prediction of ecological interactions could strongly benefit from this development. However, current methods fall short on several fronts: univariate tools do not distinguish between direct and indirect interactions, resulting in excessive false positives, while approaches with better resolution are so far computationally highly limited. Furthermore, confounding variables typical for cross-study data sets are rarely addressed. We present FlashWeave, a new approach based on a flexible Probabilistic Graphical Models framework to infer highly resolved direct microbial interactions from massive heterogeneous microbial abundance data sets with seamless integration of metadata. On a variety of benchmarks, FlashWeave outperforms state-of-the-art methods by several orders of magnitude in terms of speed while generally providing increased accuracy. We apply FlashWeave to a cross-study data set of 69 818 publicly available human gut samples, resulting in one of the largest and most diverse models of microbial interactions in the human gut to date.

Download Full-text

Adding Extra Knowledge in Scalable Learning of Sparse Differential Gaussian Graphical Models

10.1101/716852 ◽

2019 ◽

Cited By ~ 1

Author(s):

Arshdeep Sekhon ◽

Beilun Wang ◽

Yanjun Qi

Keyword(s):

Graphical Models ◽

Large Scale ◽

Scale Up ◽

Gaussian Graphical Models ◽

Edge Information ◽

Group Knowledge ◽

Scalable Learning ◽

Novel Method ◽

Synthetic Datasets ◽

Brain Data

AbstractWe focus on integrating different types of extra knowledge (other than the observed samples) for estimating the sparse structure change between two p-dimensional Gaussian Graphical Models (i.e. differential GGMs). Previous differential GGM estimators either fail to include additional knowledge or cannot scale up to a high-dimensional (large p) situation. This paper proposes a novel method KDiffNet that incorporates Additional Knowledge in identifying Differential Networks via an Elementary Estimator. We design a novel hybrid norm as a superposition of two structured norms guided by the extra edge information and the additional node group knowledge. KDiffNet is solved through a fast parallel proximal algorithm, enabling it to work in large-scale settings. KDiffNet can incorporate various combinations of existing knowledge without re-designing the optimization. Through rigorous statistical analysis we show that, while considering more evidence, KDiffNet achieves the same convergence rate as the state-of-the-art. Empirically on multiple synthetic datasets and one real-world fMRI brain data, KDiffNet significantly outperforms the cutting edge baselines with regard to the prediction performance, while achieving the same level of time cost or less.

Download Full-text

Testing Differential Gene Networks under Nonparanormal Graphical Models with False Discovery Rate Control

Genes ◽

10.3390/genes11020167 ◽

2020 ◽

Vol 11 (2) ◽

pp. 167 ◽

Cited By ~ 1

Author(s):

Qingyang Zhang

Keyword(s):

False Discovery Rate ◽

Graphical Models ◽

Rate Control ◽

Gene Networks ◽

Large Scale ◽

Graphical Model ◽

Test Statistic ◽

False Discovery Rate Control ◽

False Discovery ◽

Sample Covariance

The nonparanormal graphical model has emerged as an important tool for modeling dependency structure between variables because it is flexible to non-Gaussian data while maintaining the good interpretability and computational convenience of Gaussian graphical models. In this paper, we consider the problem of detecting differential substructure between two nonparanormal graphical models with false discovery rate control. We construct a new statistic based on a truncated estimator of the unknown transformation functions, together with a bias-corrected sample covariance. Furthermore, we show that the new test statistic converges to the same distribution as its oracle counterpart does. Both synthetic data and real cancer genomic data are used to illustrate the promise of the new method. Our proposed testing framework is simple and scalable, facilitating its applications to large-scale data. The computational pipeline has been implemented in the R package DNetFinder, which is freely available through the Comprehensive R Archive Network.

Download Full-text

Meta-analytic Gaussian Network Aggregation

Psychometrika ◽

10.1007/s11336-021-09764-3 ◽

2021 ◽

Author(s):

Sacha Epskamp ◽

Adela-Maria Isvoranu ◽

Mike W.-L. Cheung

Keyword(s):

Graphical Models ◽

Fixed Effects ◽

Large Scale ◽

Meta Analysis ◽

Correlation Coefficients ◽

Likelihood Estimation ◽

Post Traumatic Stress ◽

Modeling Framework ◽

Network Aggregation ◽

Gaussian Network

AbstractA growing number of publications focus on estimating Gaussian graphical models (GGM, networks of partial correlation coefficients). At the same time, generalizibility and replicability of these highly parameterized models are debated, and sample sizes typically found in datasets may not be sufficient for estimating the underlying network structure. In addition, while recent work emerged that aims to compare networks based on different samples, these studies do not take potential cross-study heterogeneity into account. To this end, this paper introduces methods for estimating GGMs by aggregating over multiple datasets. We first introduce a general maximum likelihood estimation modeling framework in which all discussed models are embedded. This modeling framework is subsequently used to introduce meta-analytic Gaussian network aggregation (MAGNA). We discuss two variants: fixed-effects MAGNA, in which heterogeneity across studies is not taken into account, and random-effects MAGNA, which models sample correlations and takes heterogeneity into account. We assess the performance of MAGNA in large-scale simulation studies. Finally, we exemplify the method using four datasets of post-traumatic stress disorder (PTSD) symptoms, and summarize findings from a larger meta-analysis of PTSD symptom.

Download Full-text