scholarly journals Lifted Message Passing for Hybrid Probabilistic Inference

Author(s):  
Yuqiao Chen ◽  
Nicholas Ruozzi ◽  
Sriraam Natarajan

Lifted inference algorithms for first-order logic models, e.g., Markov logic networks (MLNs), have been of significant interest in recent years.  Lifted inference methods exploit model symmetries in order to reduce the size of the model and, consequently, the computational cost of inference.  In this work, we consider the problem of lifted inference in MLNs with continuous or both discrete and continuous groundings. Existing work on lifting with continuous groundings has mostly been limited to special classes of models, e.g., Gaussian models, for which variable elimination or message-passing updates can be computed exactly.  Here, we develop approximate lifted inference schemes based on particle sampling.  We demonstrate empirically that our approximate lifting schemes perform comparably to existing state-of-the-art for models for Gaussian MLNs, while having the flexibility to be applied to models with arbitrary potential functions.

Author(s):  
Yuqiao Chen ◽  
Yibo Yang ◽  
Sriraam Natarajan ◽  
Nicholas Ruozzi

Lifted inference algorithms exploit model symmetry to reduce computational cost in probabilistic inference. However, most existing lifted inference algorithms operate only over discrete domains or continuous domains with restricted potential functions. We investigate two approximate lifted variational approaches that apply to domains with general hybrid potentials, and are expressive enough to capture multi-modality. We demonstrate that the proposed variational methods are highly scalable and can exploit approximate model symmetries even in the presence of a large amount of continuous evidence, outperforming existing message-passing-based approaches in a variety of settings. Additionally, we present a sufficient condition for the Bethe variational approximation to yield a non-trivial estimate over the marginal polytope.


Author(s):  
Mohammad Maminur Islam ◽  
Somdeb Sarkhel ◽  
Deepak Venugopal

We present a dense representation for Markov Logic Networks (MLNs) called Obj2Vec that encodes symmetries in the MLN structure. Identifying symmetries is a key challenge for lifted inference algorithms and we leverage advances in neural networks to learn symmetries which are hard to specify using hand-crafted features. Specifically, we learn an embedding for MLN objects that predicts the context of an object, i.e., objects that appear along with it in formulas of the MLN, since common contexts indicate symmetry in the distribution. Importantly, our formulation leverages well-known skip-gram models that allow us to learn the embedding efficiently. Finally, to reduce the size of the ground MLN, we sample objects based on their learned embeddings. We integrate Obj2Vec with several inference algorithms, and show the scalability and accuracy of our approach compared to other state-of-the-art methods.


Author(s):  
Somdeb Sarkhel ◽  
Deepak Venugopal ◽  
Nicholas Ruozzi ◽  
Vibhav Gogate

We address the problem of scaling up local-search or sampling-based inference in Markov logic networks (MLNs) that have large shared sub-structures but no (or few) tied weights. Such untied MLNs are ubiquitous in practical applications. However, they have very few symmetries, and as a result lifted inference algorithms--the dominant approach for scaling up inference--perform poorly on them. The key idea in our approach is to reduce the hard, time-consuming sub-task in sampling algorithms, computing the sum of weights of features that satisfy a full assignment, to the problem of computing a set of partition functions of graphical models, each defined over the logical variables in a first-order formula. The importance of this reduction is that when the treewidth of all the graphical models is small, it yields an order of magnitude speedup. When the treewidth is large, we propose an over-symmetric approximation and experimentally demonstrate that it is both fast and accurate.


2013 ◽  
Vol 47 ◽  
pp. 393-439 ◽  
Author(s):  
N. Taghipour ◽  
D. Fierens ◽  
J. Davis ◽  
H. Blockeel

Lifted probabilistic inference algorithms exploit regularities in the structure of graphical models to perform inference more efficiently. More specifically, they identify groups of interchangeable variables and perform inference once per group, as opposed to once per variable. The groups are defined by means of constraints, so the flexibility of the grouping is determined by the expressivity of the constraint language. Existing approaches for exact lifted inference use specific languages for (in)equality constraints, which often have limited expressivity. In this article, we decouple lifted inference from the constraint language. We define operators for lifted inference in terms of relational algebra operators, so that they operate on the semantic level (the constraints' extension) rather than on the syntactic level, making them language-independent. As a result, lifted inference can be performed using more powerful constraint languages, which provide more opportunities for lifting. We empirically demonstrate that this can improve inference efficiency by orders of magnitude, allowing exact inference where until now only approximate inference was feasible.


2020 ◽  
Author(s):  
Ali Raza ◽  
Arni Sturluson ◽  
Cory Simon ◽  
Xiaoli Fern

Virtual screenings can accelerate and reduce the cost of discovering metal-organic frameworks (MOFs) for their applications in gas storage, separation, and sensing. In molecular simulations of gas adsorption/diffusion in MOFs, the adsorbate-MOF electrostatic interaction is typically modeled by placing partial point charges on the atoms of the MOF. For the virtual screening of large libraries of MOFs, it is critical to develop computationally inexpensive methods to assign atomic partial charges to MOFs that accurately reproduce the electrostatic potential in their pores. Herein, we design and train a message passing neural network (MPNN) to predict the atomic partial charges on MOFs under a charge neutral constraint. A set of ca. 2,250 MOFs labeled with high-fidelity partial charges, derived from periodic electronic structure calculations, serves as training examples. In an end-to-end manner, from charge-labeled crystal graphs representing MOFs, our MPNN machine-learns features of the local bonding environments of the atoms and learns to predict partial atomic charges from these features. Our trained MPNN assigns high-fidelity partial point charges to MOFs with orders of magnitude lower computational cost than electronic structure calculations. To enhance the accuracy of virtual screenings of large libraries of MOFs for their adsorption-based applications, we make our trained MPNN model and MPNN-charge-assigned computation-ready, experimental MOF structures publicly available.<br>


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinyu Li ◽  
Wei Zhang ◽  
Jianming Zhang ◽  
Guang Li

Abstract Background Given expression data, gene regulatory network(GRN) inference approaches try to determine regulatory relations. However, current inference methods ignore the inherent topological characters of GRN to some extent, leading to structures that lack clear biological explanation. To increase the biophysical meanings of inferred networks, this study performed data-driven module detection before network inference. Gene modules were identified by decomposition-based methods. Results ICA-decomposition based module detection methods have been used to detect functional modules directly from transcriptomic data. Experiments about time-series expression, curated and scRNA-seq datasets suggested that the advantages of the proposed ModularBoost method over established methods, especially in the efficiency and accuracy. For scRNA-seq datasets, the ModularBoost method outperformed other candidate inference algorithms. Conclusions As a complicated task, GRN inference can be decomposed into several tasks of reduced complexity. Using identified gene modules as topological constraints, the initial inference problem can be accomplished by inferring intra-modular and inter-modular interactions respectively. Experimental outcomes suggest that the proposed ModularBoost method can improve the accuracy and efficiency of inference algorithms by introducing topological constraints.


2020 ◽  
Author(s):  
Yoonjee Kang ◽  
Denis Thieffry ◽  
Laura Cantini

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.


2020 ◽  
Vol 11 ◽  
Author(s):  
Shuhei Kimura ◽  
Ryo Fukutomi ◽  
Masato Tokuhisa ◽  
Mariko Okada

Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.


2021 ◽  
Author(s):  
Kevin Greenman ◽  
William Green ◽  
Rafael Gómez-Bombarelli

Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods such as time-dependent density functional theory (TD-DFT) are generalizable across chemical space because of their robust physics-based foundations but still exhibit random and systematic errors with respect to experiment despite their high computational cost. Statistical methods can achieve high accuracy at a lower cost, but data sparsity and unoptimized molecule and solvent representations often limit their ability to generalize. Here, we utilize directed message passing neural networks (D-MPNNs) to represent both dye molecules and solvents for predictions of molecular absorption peaks in solution. Additionally, we demonstrate a multi-fidelity approach based on an auxiliary model trained on over 28,000 TD-DFT calculations that further improves accuracy and generalizability, as shown through rigorous splitting strategies. Combining several openly-available experimental datasets, we benchmark these methods against a state-of-the-art regression tree algorithm and compare the D-MPNN solvent representation to several alternatives. Finally, we explore the interpretability of the learned representations using dimensionality reduction and evaluate the use of ensemble variance as an estimator of the epistemic uncertainty in our predictions of molecular peak absorption in solution. The prediction methods proposed herein can be integrated with active learning, generative modeling, and experimental workflows to enable the more rapid design of molecules with targeted optical properties.


Sign in / Sign up

Export Citation Format

Share Document