BENIN: Biologically enhanced network inference

2020 ◽  
Vol 18 (03) ◽  
pp. 2040007
Author(s):  
Stephanie Kamgnia Wonkap ◽  
Gregory Butler

Gene regulatory network inference is one of the central problems in computational biology. We need models that integrate the variety of data available in order to use their complementarity information to overcome the issues of noisy and limited data. BENIN: Biologically Enhanced Network INference is our proposal to integrate data and infer more accurate networks. BENIN is a general framework that jointly considers different types of prior knowledge with expression datasets to improve the network inference. The method states the network inference as a feature selection problem and uses a popular penalized regression method, the Elastic net, combined with bootstrap resampling to solve it. BENIN significantly outperforms the state-of-the-art methods on the simulated data from the DREAM 4 challenge when combining genome-wide location data, knockout gene expression data, and time series expression data.

2020 ◽  
Author(s):  
Yoonjee Kang ◽  
Denis Thieffry ◽  
Laura Cantini

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.


2014 ◽  
Vol 989-994 ◽  
pp. 2426-2430
Author(s):  
Zhi Hui Zhou ◽  
Gui Xia Liu ◽  
Ling Tao Su ◽  
Liang Han ◽  
Lun Yan

Extensive studies have shown that many complex diseases are influenced by interaction of certain genes, while due to the limitations and drawbacks of adopting logistic regression (LR) to detect epistasis in human Genome-Wide Association Studies (GWAS), we propose a new method named LASSO-penalized-model search algorithm (LPMA) by restricting it to a tuning constant and combining it with a penalization of the L1-norm of the complexity parameter, and it is implemented utilizing the idea of multi-step strategy. LASSO penalized regression particularly shows advantageous properties when the number of factors far exceeds the number of samples. We compare the performance of LPMA with its competitors. Through simulated data experiments, LPMA performs better regarding to the identification of epistasis and prediction accuracy.


Author(s):  
Gourab Ghosh Roy ◽  
Nicholas Geard ◽  
Karin Verspoor ◽  
Shan He

Abstract Motivation Inferring gene regulatory networks (GRNs) from expression data is a significant systems biology problem. A useful inference algorithm should not only unveil the global structure of the regulatory mechanisms but also the details of regulatory interactions such as edge direction (from regulator to target) and sign (activation/inhibition). Many popular GRN inference algorithms cannot infer edge signs, and those that can infer signed GRNs cannot simultaneously infer edge directions or network cycles. Results To address these limitations of existing algorithms, we propose Polynomial Lasso Bagging (PoLoBag) for signed GRN inference with both edge directions and network cycles. PoLoBag is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higher-order interactions. Results demonstrate that PoLoBag is consistently more accurate for signed inference than state-of-the-art algorithms on simulated and real-world expression datasets. Availability and implementation Algorithm and data are freely available at https://github.com/gourabghoshroy/PoLoBag. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Ling-Hong Hung ◽  
Kaiyuan Shi ◽  
Migao Wu ◽  
William Chad Young ◽  
Adrian E. Raftery ◽  
...  

AbstractBACKGROUND:Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a novel and computationally efficient method for eliminating redundant indirect edges in the network.FINDINGS:We evaluated the performance of fastBMA on synthetic data and experimental genome-wide yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory efficient, parallel and distributed application that scales to human genome wide expression data. A 10,000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster.CONCLUSIONS:fastBMA is a significant improvement over its predecessor ScanBMA. It is orders of magnitude faster and more accurate than other fast network inference methods such as LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable timeframe. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).


2016 ◽  
Author(s):  
Jigar S. Desai ◽  
Ryan C. Sartor ◽  
Lovely Mae Lawas ◽  
SV Krishna Jagadish ◽  
Colleen J. Doherty

AbstractOrganisms respond to changes in their environment through transcriptional regulatory networks (TRNs). The regulatory hierarchy of these networks can be inferred from expression data. Computational approaches to identify TRNs can be applied in any species where quality RNA can be acquired, However, ChIP-Seq and similar validation methods are challenging to employ in non-model species. Improving the accuracy of computational inference methods can significantly reduce the cost and time of subsequent validation experiments. We have developed ExRANGES, an approach that improves the ability to computationally infer TRN from time series expression data. ExRANGES utilizes both the rate of change in expression and the absolute expression level to identify TRN connections. We evaluated ExRANGES in five data sets from different model systems. ExRANGES improved the identification of experimentally validated transcription factor targets for all species tested, even in unevenly spaced and sparse data sets. This improved ability to predict known regulator-target relationships enhances the utility of network inference approaches in non-model species where experimental validation is challenging. We integrated ExRANGES with two different network construction approaches and it has been implemented as an R package available here: http://github.com/DohertyLab/ExRANGES. To install the package type: devtools::install_github(“DohertyLab/ExRANGES”)


2019 ◽  
Author(s):  
Atul Deshpande ◽  
Li-Fang Chu ◽  
Ron Stewart ◽  
Anthony Gitter

AbstractAdvances in single-cell transcriptomics enable measuring the gene expression of individual cells, allowing cells to be ordered by their state in a dynamic biological process. Many algorithms assign ‘pseudotimes’ to each cell, representing the progress along the biological process. Ordering the expression data according to such pseudotimes can be valuable for understanding the underlying regulator-gene interactions in a biological process, such as differentiation. However, the distribution of cells sampled along a transitional process, and hence that of the pseudotimes assigned to them, is not uniform. This prevents using many standard mathematical methods for analyzing the ordered gene expression states. We present Single-Cell Inference of Networks using Granger Ensembles (SCINGE), an algorithm for gene regulatory network inference from single-cell gene expression data. Given ordered single-cell data, SCINGE uses kernel-based Granger Causality regression, which smooths the irregular pseudotimes and missing expression values. It then aggregates the predictions from an ensemble of regression analyses with a modified Borda method to compile a ranked list of candidate interactions between transcriptional regulators and their target genes. In two mouse embryonic stem cell differentiation case studies, SCINGE outperforms other contemporary algorithms for gene network reconstruction. However, a more detailed examination reveals caveats about transcriptional network reconstruction with single-cell RNA-seq data. Network inference methods, including SCINGE, may have near random performance for predicting the targets of many individual regulators even if the aggregate performance is good. In addition, in some cases including cells’ pseudotime values can hurt the performance of network reconstruction methods. A MATLAB implementation of SCINGE is available at https://github.com/gitter-lab/SCINGE.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yoonjee Kang ◽  
Denis Thieffry ◽  
Laura Cantini

Networks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth. Here, we benchmark six single-cell network inference methods based on their reproducibility, i.e., their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis. Once taking into account networks with up to 100,000 links, GENIE3 results to be the most reproducible algorithm and, together with GRNBoost2, show higher intersection with ground-truth biological interactions. These results are independent from the single-cell sequencing platform, the cell type annotation system and the number of cells constituting the dataset. Finally, GRNBoost2 and CLR show more reproducible performance once a more stringent thresholding is applied to the networks (1,000–100 links). In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.


Sign in / Sign up

Export Citation Format

Share Document