Learning causal biological networks with the principle of Mendelian randomization

AbstractAlthough large amounts of genomic data are available, it remains a challenge to reliably infer causal (i.e., regulatory) relationships among molecular phenotypes (such as gene expression), especially when many phenotypes are involved. We extend the interpretation of the Principle of Mendelian randomization (PMR) and present MRPC, a novel machine learning algorithm that incorporates the PMR in classical algorithms for learning causal graphs in computer science. MRPC learns a causal biological network efficiently and robustly from integrating genotype and molecular phenotype data, in which directed edges indicate causal directions. We demonstrate through simulation that MRPC outperforms existing general-purpose network inference methods and other PMR-based methods. We apply MRPC to distinguish direct and indirect targets among multiple genes associated with expression quantitative trait loci.

Download Full-text

MRPC: An R Package for Inference of Causal Graphs

Frontiers in Genetics ◽

10.3389/fgene.2021.651812 ◽

2021 ◽

Vol 12 ◽

Author(s):

Md. Bahadur Badsha ◽

Evan A. Martin ◽

Audrey Qiuyan Fu

Keyword(s):

Regulatory Networks ◽

Mendelian Randomization ◽

R Package ◽

General Purpose ◽

Directed Acyclic Graphs ◽

Biomedical Data ◽

Causal Relationships ◽

Pc Algorithm ◽

Causal Graphs ◽

Inference Methods

Understanding the causal relationships between variables is a central goal of many scientific inquiries. Causal relationships may be represented by directed edges in a graph (or equivalently, a network). In biology, for example, gene regulatory networks may be viewed as a type of causal networks, where X→Y represents gene X regulating (i.e., being causal to) gene Y. However, existing general-purpose graph inference methods often result in a high number of false edges, whereas current causal inference methods developed for observational data in genomics can handle only limited types of causal relationships. We present MRPC (a PC algorithm with the principle of Mendelian Randomization), an R package that learns causal graphs with improved accuracy over existing methods. Our algorithm builds on the powerful PC algorithm (named after its developers Peter Spirtes and Clark Glymour), a canonical algorithm in computer science for learning directed acyclic graphs. The improvements in MRPC result in increased accuracy in identifying v-structures (i.e., X→Y←Z), and robustness to how the nodes are arranged in the input data. In the special case of genomic data that contain genotypes and phenotypes (e.g., gene expression) at the individual level, MRPC incorporates the principle of Mendelian randomization as constraints on edge direction to help orient the edges. MRPC allows for inference of causal graphs not only for general purposes, but also for biomedical data where multiple types of data may be input to provide evidence for causality. The R package is available on CRAN and is a free open-source software package under a GPL (≥2) license.

Download Full-text

Improved High Dimensional Discrete Bayesian Network Inference using Triplet Region Construction

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12198 ◽

2020 ◽

Vol 69 ◽

pp. 231-295

Author(s):

Peng Lin ◽

Martin Neil ◽

Norman Fenton

Keyword(s):

Network Inference ◽

General Purpose ◽

Space Complexity ◽

High Dimensional ◽

Choice Problem ◽

Worst Case ◽

Exact Inference ◽

Tree Width ◽

Inference Methods ◽

Bayesian Network Inference

Performing efficient inference on high dimensional discrete Bayesian Networks (BNs) is challenging. When using exact inference methods the space complexity can grow exponentially with the tree-width, thus making computation intractable. This paper presents a general purpose approximate inference algorithm, based on a new region belief approximation method, called Triplet Region Construction (TRC). TRC reduces the cluster space complexity for factorized models from worst-case exponential to polynomial by performing graph factorization and producing clusters of limited size. Unlike previous generations of region-based algorithms, TRC is guaranteed to converge and effectively addresses the region choice problem that bedevils other region-based algorithms used for BN inference. Our experiments demonstrate that it also achieves significantly more accurate results than competing algorithms.

Download Full-text

Gaining confidence in inferred networks

10.1101/2020.09.19.304980 ◽

2020 ◽

Author(s):

Léo P.M. Diaz ◽

Michael P.H. Stumpf

Keyword(s):

Biological Networks ◽

Regulatory Networks ◽

Network Inference ◽

False Negative ◽

Simulated Data ◽

Real Data ◽

Point Interactions ◽

Inference Algorithms ◽

Starting Point ◽

Inference Methods

AbstractNetwork inference is a notoriously challenging problem. Inferred networks are associated with high uncertainty and likely riddled with false positive and false negative interactions. Especially for biological networks we do not have good ways of judging the performance of inference methods against real networks, and instead we often rely solely on the performance against simulated data. Gaining confidence in networks inferred from real data nevertheless thus requires establishing reliable validation methods. Here, we argue that the expectation of mixing patterns in biological networks such as gene regulatory networks offers a reasonable starting point: interactions are more likely to occur between nodes with similar biological functions. We can quantify this behaviour using the assortativity coefficient, and here we show that the resulting heuristic, functional assortativity, offers a reliable and informative route for comparing different inference algorithms.

Download Full-text

ModularBoost: an efficient network inference algorithm based on module decomposition

BMC Bioinformatics ◽

10.1186/s12859-021-04074-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xinyu Li ◽

Wei Zhang ◽

Jianming Zhang ◽

Guang Li

Keyword(s):

Network Inference ◽

Detection Methods ◽

Inference Problem ◽

Topological Constraints ◽

Inference Algorithms ◽

Module Detection ◽

Series Expression ◽

Gene Modules ◽

Inference Methods ◽

Complicated Task

Abstract Background Given expression data, gene regulatory network(GRN) inference approaches try to determine regulatory relations. However, current inference methods ignore the inherent topological characters of GRN to some extent, leading to structures that lack clear biological explanation. To increase the biophysical meanings of inferred networks, this study performed data-driven module detection before network inference. Gene modules were identified by decomposition-based methods. Results ICA-decomposition based module detection methods have been used to detect functional modules directly from transcriptomic data. Experiments about time-series expression, curated and scRNA-seq datasets suggested that the advantages of the proposed ModularBoost method over established methods, especially in the efficiency and accuracy. For scRNA-seq datasets, the ModularBoost method outperformed other candidate inference algorithms. Conclusions As a complicated task, GRN inference can be decomposed into several tasks of reduced complexity. Using identified gene modules as topological constraints, the initial inference problem can be accomplished by inferring intra-modular and inter-modular interactions respectively. Experimental outcomes suggest that the proposed ModularBoost method can improve the accuracy and efficiency of inference algorithms by introducing topological constraints.

Download Full-text

Evaluating the reproducibility of single-cell gene regulatory network inference algorithms

10.1101/2020.11.10.375923 ◽

2020 ◽

Author(s):

Yoonjee Kang ◽

Denis Thieffry ◽

Laura Cantini

Keyword(s):

Single Cell ◽

Network Inference ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Gene Regulatory Network Inference ◽

Sequencing Platform ◽

Cell Network ◽

Inference Algorithms ◽

Inference Methods

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.

Download Full-text

BRANE Cut: Biologically-Related A priori Network Enhancement with Graph cuts for Gene Regulatory Network Inference

10.1101/032383 ◽

2015 ◽

Author(s):

Aurélie Pirayre ◽

Camille Couprie ◽

Frédérique Bidard ◽

Laurent Duval ◽

Jean-Christophe Pesquet

Keyword(s):

Gene Regulatory Network ◽

Regulatory Network ◽

Gene Networks ◽

Network Inference ◽

State Of The Art ◽

A Priori ◽

Graph Cuts ◽

Gene Regulatory Network Inference ◽

Gene Regulatory ◽

Inference Methods

Background: Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulatory relationships in organism cells. Despite the large number of available Gene Regulatory Network inference methods, the problem remains challenging: the underdetermination in the space of possible solutions requires additional constraints that incorporate a priori information on gene interactions. Methods: Weighting all possible pairwise gene relationships by a probability of edge presence, we formulate the regulatory network inference as a discrete variational problem on graphs. We enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. We compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge. Results: Our BRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6% to 11%). On a real Escherichia coli compendium, an improvement of 11.8% compared to CLR and 3% compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster. The BRANE Cut code is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-cut.html Conclusions: BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the-art GRN inference methods. It is applicable as a generic network inference post-processing, due its computational efficiency.

Download Full-text

SimiC: A Single Cell Gene Regulatory Network Inference method with Similarity Constraints

10.1101/2020.04.03.023002 ◽

2020 ◽

Author(s):

Jianhao Peng ◽

Ullas V. Chembazhi ◽

Sushant Bangru ◽

Ian M. Traniello ◽

Auinash Kalsotra ◽

...

Keyword(s):

Single Cell ◽

Network Inference ◽

Regional Analysis ◽

Supplementary Information ◽

Inference Method ◽

Gene Regulatory Network Inference ◽

Inference Problem ◽

Cell State ◽

Gene Regulatory ◽

Inference Methods

AbstractMotivationWith the use of single-cell RNA sequencing (scRNA-Seq) technologies, it is now possible to acquire gene expression data for each individual cell in samples containing up to millions of cells. These cells can be further grouped into different states along an inferred cell differentiation path, which are potentially characterized by similar, but distinct enough, gene regulatory networks (GRNs). Hence, it would be desirable for scRNA-Seq GRN inference methods to capture the GRN dynamics across cell states. However, current GRN inference methods produce a unique GRN per input dataset (or independent GRNs per cell state), failing to capture these regulatory dynamics.ResultsWe propose a novel single-cell GRN inference method, named SimiC, that jointly infers the GRNs corresponding to each state. SimiC models the GRN inference problem as a LASSO optimization problem with an added similarity constraint, on the GRNs associated to contiguous cell states, that captures the inter-cell-state homogeneity. We show on a mouse hepatocyte single-cell data generated after partial hepatectomy that, contrary to previous GRN methods for scRNA-Seq data, SimiC is able to capture the transcription factor (TF) dynamics across liver regeneration, as well as the cell-level behavior for the regulatory program of each TF across cell states. In addition, on a honey bee scRNA-Seq experiment, SimiC is able to capture the increased heterogeneity of cells on whole-brain tissue with respect to a regional analysis tissue, and the TFs associated specifically to each sequenced tissue.AvailabilitySimiC is written in Python and includes an R API. It can be downloaded from https://github.com/jianhao2016/[email protected], [email protected] informationSupplementary data are available at the code repository.

Download Full-text

A Transfer Learning Approach and Selective Integration of Multiple Types of Assays for Biological Network Inference

Computational Knowledge Discovery for Bioinformatics Research ◽

10.4018/978-1-4666-1785-8.ch011 ◽

2013 ◽

pp. 188-202

Author(s):

Tsuyoshi Kato ◽

Kinya Okada ◽

Hisashi Kashima ◽

Masashi Sugiyama

Keyword(s):

Metabolic Network ◽

Transfer Learning ◽

Protein Interaction ◽

Biological Networks ◽

Protein Interaction Network ◽

Network Inference ◽

Interaction Network ◽

Statistical Test ◽

Learning Approach ◽

Selective Integration

The authors’ algorithm was favorably examined on two kinds of biological networks: a metabolic network and a protein interaction network. A statistical test confirmed that the weight that our algorithm assigned to each assay was meaningful.

Download Full-text

A Framework for an Artificial-Neural-Network-Based Electronic Nose

10.4018/978-1-6684-2408-7.ch017 ◽

2022 ◽

pp. 350-374

Author(s):

Mudassir Ismail ◽

Ahmed Abdul Majeed ◽

Yousif Abdullatif Albastaki

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Electronic Nose ◽

Learning Algorithm ◽

General Purpose ◽

Food Spoilage ◽

Standard Data ◽

Semiconductor Gas Sensor ◽

Artificial Neural

Machine odor detection has developed into an important aspect of our lives with various applications of it. From detecting food spoilage to diagnosis of diseases, it has been developed and tested in various fields and industries for specific purposes. This project, artificial-neural-network-based electronic nose (ANNeNose), is a machine-learning-based e-nose system that has been developed for detection of various types of odors for a general purpose. The system can be trained on any odor using various e-nose sensor types. It uses artificial neural network as its machine learning algorithm along with an OMX-GR semiconductor gas sensor for collecting odor data. The system was trained and tested with five different types of odors collected through a standard data collection method and then purified, which in turn had a result varying from 93% to 100% accuracy.

Download Full-text

A portable method for acquiring information extraction patterns without annotated corpora

Natural Language Engineering ◽

10.1017/s1351324902003042 ◽

2003 ◽

Vol 9 (2) ◽

pp. 151-179 ◽

Cited By ~ 2

Author(s):

NEUS CATALÀ ◽

NÚRIA CASTELL ◽

MARIO MARTÍN

Keyword(s):

Information Extraction ◽

Learning Algorithm ◽

Relevant Information ◽

General Purpose ◽

Human Intervention ◽

Lexical Knowledge ◽

Distinctive Features ◽

Domain Specific ◽

Building Information ◽

Automatic Acquisition

The main issue when building Information Extraction (IE) systems is how to obtain the knowledge needed to identify relevant information in a document. Most approaches require expert human intervention in many steps of the acquisition process. In this paper we describe ESSENCE, a new method for acquiring IE patterns that significantly reduces the need for human intervention. The method is based on ELA, a specifically designed learning algorithm for acquiring IE patterns without tagged examples. The distinctive features of ESSENCE and ELA are that (1) they permit the automatic acquisition of IE patterns from unrestricted and untagged text representative of the domain, due to (2) their ability to identify regularities around semantically relevant concept-words for the IE task by (3) using non-domain-specific lexical knowledge tools such as WordNet, and (4) restricting the human intervention to defining the task, and validating and typifying the set of IE patterns obtained. Since ESSENCE does not require a corpus annotated with the type of information to be extracted and it uses a general purpose ontology and widely applied syntactic tools, it reduces the expert effort required to build an IE system and therefore also reduces the effort of porting the method to any domain. The results of the application of ESSENCE to the acquisition of IE patterns in an MUC-like task are shown.

Download Full-text