Classification in biological networks with hypergraphlet kernels

Bioinformatics ◽

10.1093/bioinformatics/btaa768 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jose Lugo-Martinez ◽

Daniel Zeiberg ◽

Thomas Gaudelet ◽

Noël Malod-Dognin ◽

Natasa Przulj ◽

...

Keyword(s):

Biological Networks ◽

Kernel Method ◽

Information Loss ◽

Cellular Systems ◽

Supplementary Information ◽

Edge Classification ◽

Vertex Classification ◽

Prediction Problems ◽

Potential Use ◽

Modeling Physical Systems

Abstract Motivation Biological and cellular systems are often modeled as graphs in which vertices represent objects of interest (genes, proteins and drugs) and edges represent relational ties between these objects (binds-to, interacts-with and regulates). This approach has been highly successful owing to the theory, methodology and software that support analysis and learning on graphs. Graphs, however, suffer from information loss when modeling physical systems due to their inability to accurately represent multiobject relationships. Hypergraphs, a generalization of graphs, provide a framework to mitigate information loss and unify disparate graph-based methodologies. Results We present a hypergraph-based approach for modeling biological systems and formulate vertex classification, edge classification and link prediction problems on (hyper)graphs as instances of vertex classification on (extended, dual) hypergraphs. We then introduce a novel kernel method on vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The method is based on exact and inexact (via hypergraph edit distances) enumeration of hypergraphlets; i.e. small hypergraphs rooted at a vertex of interest. We empirically evaluate this method on fifteen biological networks and show its potential use in a positive-unlabeled setting to estimate the interactome sizes in various species. Availability and implementation https://github.com/jlugomar/hypergraphlet-kernels Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SMILE: Mutual Information Learning for Integration of Single-cell Omics Data

Bioinformatics ◽

10.1093/bioinformatics/btab706 ◽

2021 ◽

Author(s):

Yang Xu ◽

Priyojit Das ◽

Rachel Patton McCord

Keyword(s):

Deep Learning ◽

Mutual Information ◽

Single Cell ◽

Learning Algorithm ◽

Cellular Systems ◽

Supplementary Information ◽

Omics Data ◽

Learning Approaches ◽

Rna Seq ◽

Integrate Data

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.

Download Full-text

SigHotSpotter: scRNA-seq-based computational tool to control cell subpopulation phenotypes for cellular rejuvenation strategies

Bioinformatics ◽

10.1093/bioinformatics/btz827 ◽

2019 ◽

Cited By ~ 2

Author(s):

Srikanth Ravichandran ◽

András Hartmann ◽

Antonio del Sol

Keyword(s):

Single Cell ◽

Control Cell ◽

General Purpose ◽

Cellular Systems ◽

Supplementary Information ◽

Cell Subpopulation ◽

Transcriptional Networks ◽

Data Generation ◽

Computational Tool ◽

Precise Control

Abstract Summary Single-cell RNA-sequencing is increasingly employed to characterize disease or ageing cell subpopulation phenotypes. Despite exponential increase in data generation, systematic identification of key regulatory factors for controlling cellular phenotype to enable cell rejuvenation in disease or ageing remains a challenge. Here, we present SigHotSpotter, a computational tool to predict hotspots of signaling pathways responsible for the stable maintenance of cell subpopulation phenotypes, by integrating signaling and transcriptional networks. Targeted perturbation of these signaling hotspots can enable precise control of cell subpopulation phenotypes. SigHotSpotter correctly predicts the signaling hotspots with known experimental validations in different cellular systems. The tool is simple, user-friendly and is available as web-server or as stand-alone software. We believe SigHotSpotter will serve as a general purpose tool for the systematic prediction of signaling hotspots based on single-cell RNA-seq data, and potentiate novel cell rejuvenation strategies in the context of disease and ageing. Availability and implementation SigHotSpotter is at https://SigHotSpotter.lcsb.uni.lu as a web tool. Source code, example datasets and other information are available at https://gitlab.com/srikanth.ravichandran/sighotspotter. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RMTL: an R library for multi-task learning

Bioinformatics ◽

10.1093/bioinformatics/bty831 ◽

2018 ◽

Vol 35 (10) ◽

pp. 1797-1798 ◽

Cited By ~ 2

Author(s):

Han Cao ◽

Jiayu Zhou ◽

Emanuel Schwarz

Keyword(s):

Biological Networks ◽

Simulated Data ◽

R Package ◽

Low Rank ◽

Supplementary Information ◽

Supplementary Data ◽

Software Environment ◽

Machine Learning Technique ◽

Task Learning ◽

Learning Technique

Abstract Motivation Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research. Results We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data. Availability and implementation The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks

Bioinformatics ◽

10.1093/bioinformatics/btaa459 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i464-i473

Author(s):

Kapil Devkota ◽

James M Murphy ◽

Lenore J Cowen

Keyword(s):

Network Structure ◽

Biological Networks ◽

Link Prediction ◽

Prediction Method ◽

Global Network ◽

Local Network ◽

Supplementary Information ◽

Network Data ◽

Ppi Network ◽

Diffusion State

Abstract Motivation One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein–protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. Results We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE’s global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn’s disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. Availability and implementation GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HiSCF: leveraging higher-order structures for clustering analysis in biological networks

Bioinformatics ◽

10.1093/bioinformatics/btaa775 ◽

2020 ◽

Cited By ~ 1

Author(s):

Lun Hu ◽

Jun Zhang ◽

Xiangyu Pan ◽

Hong Yan ◽

Zhu-Hong You

Keyword(s):

Biological Networks ◽

Clustering Analysis ◽

Biological Network ◽

Higher Order ◽

Supplementary Information ◽

Network Motifs ◽

Functional Modules ◽

Connectivity Patterns ◽

Biological Entities ◽

Insight Into

Abstract Motivation Clustering analysis in a biological network is to group biological entities into functional modules, thus providing valuable insight into the understanding of complex biological systems. Existing clustering techniques make use of lower-order connectivity patterns at the level of individual biological entities and their connections, but few of them can take into account of higher-order connectivity patterns at the level of small network motifs. Results Here, we present a novel clustering framework, namely HiSCF, to identify functional modules based on the higher-order structure information available in a biological network. Taking advantage of higher-order Markov stochastic process, HiSCF is able to perform the clustering analysis by exploiting a variety of network motifs. When compared with several state-of-the-art clustering models, HiSCF yields the best performance for two practical clustering applications, i.e. protein complex identification and gene co-expression module detection, in terms of accuracy. The promising performance of HiSCF demonstrates that the consideration of higher-order network motifs gains new insight into the analysis of biological networks, such as the identification of overlapping protein complexes and the inference of new signaling pathways, and also reveals the rich higher-order organizational structures presented in biological networks. Availability and implementation HiSCF is available at https://github.com/allenv5/HiSCF. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Controlling large Boolean networks with single-step perturbations

Bioinformatics ◽

10.1093/bioinformatics/btz371 ◽

2019 ◽

Vol 35 (14) ◽

pp. i558-i567 ◽

Cited By ~ 2

Author(s):

Alexis Baudin ◽

Soumya Paul ◽

Cui Su ◽

Jun Pang

Keyword(s):

Biological Networks ◽

Boolean Network ◽

Real Life ◽

Extended Period ◽

Boolean Networks ◽

Single Step ◽

Divide And Conquer ◽

Supplementary Information ◽

Initial State ◽

Minimal Subset

Abstract Motivation The control of Boolean networks has traditionally focussed on strategies where the perturbations are applied to the nodes of the network for an extended period of time. In this work, we study if and how a Boolean network can be controlled by perturbing a minimal set of nodes for a single-step and letting the system evolve afterwards according to its original dynamics. More precisely, given a Boolean network (BN), we compute a minimal subset Cmin of the nodes such that BN can be driven from any initial state in an attractor to another ‘desired’ attractor by perturbing some or all of the nodes of Cmin for a single-step. Such kind of control is attractive for biological systems because they are less time consuming than the traditional strategies for control while also being financially more viable. However, due to the phenomenon of state-space explosion, computing such a minimal subset is computationally inefficient and an approach that deals with the entire network in one-go, does not scale well for large networks. Results We develop a ‘divide-and-conquer’ approach by decomposing the network into smaller partitions, computing the minimal control on the projection of the attractors to these partitions and then composing the results to obtain Cmin for the whole network. We implement our method and test it on various real-life biological networks to demonstrate its applicability and efficiency. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

NetQuilt: Deep Multispecies Network-based Protein Function Prediction using Homology-informed Network Similarity

Bioinformatics ◽

10.1093/bioinformatics/btab098 ◽

2021 ◽

Author(s):

Meet Barot ◽

Vladimir Gligorijević ◽

Kyunghyun Cho ◽

Richard Bonneau

Keyword(s):

Biological Networks ◽

Protein Function ◽

Functional Annotation ◽

Sequence Similarity ◽

Function Prediction ◽

Supplementary Information ◽

Learning Sequence ◽

Network Information ◽

Ppi Networks ◽

Multiple Species

Abstract Motivation Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. Results In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method, and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. Availability The code is freely available at https://github.com/nowittynamesleft/NetQuilt Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Atlas: automatic modeling of regulation of bacterial gene expression and metabolism using rule-based languages

Bioinformatics ◽

10.1093/bioinformatics/btaa1040 ◽

2020 ◽

Author(s):

Rodrigo Santibáñez ◽

Daniel Garrido ◽

Alberto J M Martin

Keyword(s):

Gene Expression ◽

Biological Networks ◽

Regulatory Networks ◽

Dynamic Models ◽

Stochastic Dynamics ◽

Divide And Conquer ◽

Supplementary Information ◽

Bacterial Gene ◽

Rule Based ◽

Gene Regulatory

Abstract Motivation Cells are complex systems composed of hundreds of genes whose products interact to produce elaborated behaviors. To control such behaviors, cells rely on transcription factors to regulate gene expression, and gene regulatory networks (GRNs) are employed to describe and understand such behavior. However, GRNs are static models, and dynamic models are difficult to obtain due to their size, complexity, stochastic dynamics and interactions with other cell processes. Results We developed Atlas, a Python software that converts genome graphs and gene regulatory, interaction and metabolic networks into dynamic models. The software employs these biological networks to write rule-based models for the PySB framework. The underlying method is a divide-and-conquer strategy to obtain sub-models and combine them later into an ensemble model. To exemplify the utility of Atlas, we used networks of varying size and complexity of Escherichia coli and evaluated in silico modifications, such as gene knockouts and the insertion of promoters and terminators. Moreover, the methodology could be applied to the dynamic modeling of natural and synthetic networks of any bacteria. Availability and implementation Code, models and tutorials are available online (https://github.com/networkbiolab/atlas). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PAFway: pairwise associations between functional annotations in biological networks and pathways

Bioinformatics ◽

10.1093/bioinformatics/btaa639 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4963-4964

Author(s):

Mahiar Mahjoub ◽

Daphne Ezer

Keyword(s):

Arabidopsis Thaliana ◽

Biological Networks ◽

Gene Networks ◽

Gene Network ◽

Supplementary Information ◽

Supplementary Data ◽

Specific Function ◽

Biological Functions ◽

Functional Annotations ◽

Large Gene

Abstract Motivation Large gene networks can be dense and difficult to interpret in a biologically meaningful way. Results Here, we introduce PAFway, which estimates pairwise associations between functional annotations in biological networks and pathways. It answers the biological question: do genes that have a specific function tend to regulate genes that have a different specific function? The results can be visualized as a heatmap or a network of biological functions. We apply this package to reveal associations between functional annotations in an Arabidopsis thaliana gene network. Availability and implementation PAFway is submitted to CRAN. Currently available here: https://github.com/ezer/PAFway. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Maximum likelihood reconstruction of ancestral networks by integer linear programming

Bioinformatics ◽

10.1093/bioinformatics/btaa931 ◽

2020 ◽

Author(s):

Vaibhav Rajan ◽

Ziqi Zhang ◽

Carl Kingsford ◽

Xiuwei Zhang

Keyword(s):

Linear Programming ◽

Maximum Likelihood ◽

Integer Linear Programming ◽

Biological Networks ◽

Network Reconstruction ◽

Supplementary Information ◽

Greedy Heuristics ◽

Optimal Solutions ◽

Network Growth ◽

Ppi Networks

Abstract Motivation The study of the evolutionary history of biological networks enables deep functional understanding of various bio-molecular processes. Network growth models, such as the Duplication–Mutation with Complementarity (DMC) model, provide a principled approach to characterizing the evolution of protein–protein interactions (PPIs) based on duplication and divergence. Current methods for model-based ancestral network reconstruction primarily use greedy heuristics and yield sub-optimal solutions. Results We present a new Integer Linear Programming (ILP) solution for maximum likelihood reconstruction of ancestral PPI networks using the DMC model. We prove the correctness of our solution that is designed to find the optimal solution. It can also use efficient heuristics from general-purpose ILP solvers to obtain multiple optimal and near-optimal solutions that may be useful in many applications. Experiments on synthetic data show that our ILP obtains solutions with higher likelihood than those from previous methods, and is robust to noise and model mismatch. We evaluate our algorithm on two real PPI networks, with proteins from the families of bZIP transcription factors and the Commander complex. On both the networks, solutions from our ILP have higher likelihood and are in better agreement with independent biological evidence from other studies. Availability and implementation A Python implementation is available at https://bitbucket.org/cdal/network-reconstruction. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text