Analyzing synergistic and non-synergistic interactions in signalling pathways using Boolean Nested Effect Models

Abstract Motivation: Understanding the structure and interplay of cellular signalling pathways is one of the great challenges in molecular biology. Boolean Networks can infer signalling networks from observations of protein activation. In situations where it is difficult to assess protein activation directly, Nested Effect Models are an alternative. They derive the network structure indirectly from downstream effects of pathway perturbations. To date, Nested Effect Models cannot resolve signalling details like the formation of signalling complexes or the activation of proteins by multiple alternative input signals. Here we introduce Boolean Nested Effect Models (B-NEM). B-NEMs combine the use of downstream effects with the higher resolution of signalling pathway structures in Boolean Networks. Results: We show that B-NEMs accurately reconstruct signal flows in simulated data. Using B-NEM we then resolve BCR signalling via PI3K and TAK1 kinases in BL2 lymphoma cell lines. Availability and implementation: R code is available at https://github.com/MartinFXP/B-NEM (github). The BCR signalling dataset is available at the GEO database (http://www.ncbi.nlm.nih.gov/geo/) through accession number GSE68761. Contact: [email protected], [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Faculty Opinions recommendation of Signalling pathways mediating specific synergistic interactions between GDF9 and BMP15.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717953554.793459155 ◽

2012 ◽

Author(s):

Alan McNeilly

Keyword(s):

Signalling Pathways ◽

Synergistic Interactions

Download Full-text

Modularisation of published and novel models toward a complex KIR2DL4 pathway in pbNK cell

10.21203/rs.3.rs-593814/v1 ◽

2021 ◽

Author(s):

Nurul Izza Ismail

Keyword(s):

Amino Acid ◽

Natural Killer ◽

Quantitative Model ◽

Qualitative Description ◽

Signalling Pathways ◽

Amino Acid Residues ◽

Kegg Database ◽

Intracellular Signalling Pathways ◽

Downstream Effects ◽

New Models

Abstract KIR2DL4 is an interesting receptor expressed on the peripheral blood natural killer (pbNK) cell as it can be either activating or inhibitory depending on the amino acid residues in the domain. This model uses mathematical modelling to investigate the downstream effects of natural killer cells’ activation (KIR2DL4) receptor after stimulation by key ligand (HLA-G) on pbNK cells. Development of this large pathway is based on a comprehensive qualitative description of pbNKs’ intracellular signalling pathways leading to chemokine and cytotoxin secretion, obtained from the KEGG database (https://www.genome.jp/kegg-bin/show pathway?hsa04650). From this qualitative description we built a quantitative model for the pathway, reusing existing curated models where possible and implementing new models as needed. This large pathway consists of two published sub-models; the Ca2+ model and the NFAT model, and a newly built FCeRIγ sub-model. The full pathway was fitted to HLA-G-KIR2DL4 pathway published dataset and the model that we developed fitted well to one of two secreted cytokines. The model can be used to predict the production of IFNγ and TNFα cytokines.

Download Full-text

Haplotype assembly of autotetraploid potato using integer linear programing

Bioinformatics ◽

10.1093/bioinformatics/btz060 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3279-3286 ◽

Cited By ~ 4

Author(s):

Enrico Siragusa ◽

Niina Haiminen ◽

Richard Finkers ◽

Richard Visser ◽

Laxmi Parida

Keyword(s):

Experimental Data ◽

Experimental Studies ◽

Simulated Data ◽

Supplementary Information ◽

Linear Programs ◽

Plant Genomics ◽

Optimal Method ◽

Haplotype Assembly ◽

Open Issue ◽

Removal Model

Abstract Summary Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. Availability and implementation Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A network of networks approach for modeling interconnected brain tissue-specific networks

Bioinformatics ◽

10.1093/bioinformatics/btz032 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3092-3101 ◽

Cited By ~ 1

Author(s):

Hideko Kawakubo ◽

Yusuke Matsui ◽

Itaru Kushima ◽

Norio Ozaki ◽

Teppei Shimamura

Keyword(s):

Learning Algorithm ◽

Simulated Data ◽

Autism Spectrum ◽

Supplementary Information ◽

Sparse Learning ◽

Topological Information ◽

Infinite Point ◽

Neurogenetic Disorders ◽

Information Matrices ◽

Network Of Networks

Abstract Motivation Recent sequence-based analyses have identified a lot of gene variants that may contribute to neurogenetic disorders such as autism spectrum disorder and schizophrenia. Several state-of-the-art network-based analyses have been proposed for mechanical understanding of genetic variants in neurogenetic disorders. However, these methods were mainly designed for modeling and analyzing single networks that do not interact with or depend on other networks, and thus cannot capture the properties between interdependent systems in brain-specific tissues, circuits and regions which are connected each other and affect behavior and cognitive processes. Results We introduce a novel and efficient framework, called a ‘Network of Networks’ approach, to infer the interconnectivity structure between multiple networks where the response and the predictor variables are topological information matrices of given networks. We also propose Graph-Oriented SParsE Learning, a new sparse structural learning algorithm for network data to identify a subset of the topological information matrices of the predictors related to the response. We demonstrate on simulated data that propose Graph-Oriented SParsE Learning outperforms existing kernel-based algorithms in terms of F-measure. On real data from human brain region-specific functional networks associated with the autism risk genes, we show that the ‘Network of Networks’ model provides insights on the autism-associated interconnectivity structure between functional interaction networks and a comprehensive understanding of the genetic basis of autism across diverse regions of the brain. Availability and implementation Our software is available from https://github.com/infinite-point/GOSPEL. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

L1EM: a tool for accurate locus specific LINE-1 RNA quantification

Bioinformatics ◽

10.1093/bioinformatics/btz724 ◽

2019 ◽

Cited By ~ 3

Author(s):

Wilson McKerrow ◽

David Fenyö

Keyword(s):

Expectation Maximization Algorithm ◽

Simulated Data ◽

Cellular Damage ◽

Supplementary Information ◽

Genomic Locus ◽

Protein Coding ◽

Disease States ◽

Rna Quantification ◽

Long Read ◽

Specific Line

Abstract Motivation LINE-1 elements are retrotransposons that are capable of copying their sequence to new genomic loci. LINE-1 derepression is associated with a number of disease states, and has the potential to cause significant cellular damage. Because LINE-1 elements are repetitive, it is difficult to quantify LINE-1 RNA at specific loci and to separate transcripts with protein coding capability from other sources of LINE-1 RNA. Results We provide a tool, L1EM that uses the expectation maximization algorithm to quantify LINE-1 RNA at each genomic locus, separating transcripts that are capable of generating retrotransposition from those that are not. We show the accuracy of L1EM on simulated data and against long read sequencing from HEK cells. Availability and implementation L1EM is written in python. The source code along with the necessary annotations are available at https://github.com/FenyoLab/L1EM and distributed under GPLv3. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ADFinder: accurate detection of programmed DNA elimination using NGS high-throughput sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa226 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3632-3636 ◽

Cited By ~ 2

Author(s):

Weibo Zheng ◽

Jing Chen ◽

Thomas G Doak ◽

Weibo Song ◽

Ying Yan

Keyword(s):

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Supplementary Information ◽

Sequencing Data ◽

Source Codes ◽

High Throughput Sequencing Data ◽

Dna Elimination ◽

Multiple Alternative ◽

Dna Splicing

Abstract Motivation Programmed DNA elimination (PDE) plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific for the detection of DNA splicing events is scarce. In this paper, we describe Accurate Deletion Finder (ADFinder), an efficient detector of PDEs using high-throughput sequencing data. ADFinder can predict PDEs with relatively low sequencing coverage, detect multiple alternative splicing forms in the same genomic location and calculate the frequency for each splicing event. This software will facilitate research of PDEs and all down-stream analyses. Results By analyzing genome-wide DNA splicing events in two micronuclear genomes of Oxytricha trifallax and Tetrahymena thermophila, we prove that ADFinder is effective in predicting large scale PDEs. Availability and implementation The source codes and manual of ADFinder are available in our GitHub website: https://github.com/weibozheng/ADFinder. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RTNduals: an R/Bioconductor package for analysis of co-regulation and inference of dual regulons

Bioinformatics ◽

10.1093/bioinformatics/btz534 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5357-5358

Author(s):

Vinicius S Chagas ◽

Clarice S Groeneveld ◽

Kelin G Oliveira ◽

Sheyla Trefflich ◽

Rodrigo C de Almeida ◽

...

Keyword(s):

Regulatory Networks ◽

Target Genes ◽

Supplementary Information ◽

Transcriptional Networks ◽

Bioconductor Package ◽

Multiple Target ◽

R Language ◽

Downstream Effects ◽

Bioconductor Project ◽

General Method

Abstract Motivation Transcription factors (TFs) are key regulators of gene expression, and can activate or repress multiple target genes, forming regulatory units, or regulons. Understanding downstream effects of these regulators includes evaluating how TFs cooperate or compete within regulatory networks. Here we present RTNduals, an R/Bioconductor package that implements a general method for analyzing pairs of regulons. Results RTNduals identifies a dual regulon when the number of targets shared between a pair of regulators is statistically significant. The package extends the RTN (Reconstruction of Transcriptional Networks) package, and uses RTN transcriptional networks to identify significant co-regulatory associations between regulons. The Supplementary Information reports two case studies for TFs using the METABRIC and TCGA breast cancer cohorts. Availability and implementation RTNduals is written in the R language, and is available from the Bioconductor project at http://bioconductor.org/packages/RTNduals/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RMTL: an R library for multi-task learning

Bioinformatics ◽

10.1093/bioinformatics/bty831 ◽

2018 ◽

Vol 35 (10) ◽

pp. 1797-1798 ◽

Cited By ~ 2

Author(s):

Han Cao ◽

Jiayu Zhou ◽

Emanuel Schwarz

Keyword(s):

Biological Networks ◽

Simulated Data ◽

R Package ◽

Low Rank ◽

Supplementary Information ◽

Supplementary Data ◽

Software Environment ◽

Machine Learning Technique ◽

Task Learning ◽

Learning Technique

Abstract Motivation Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research. Results We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data. Availability and implementation The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detecting and correcting misclassified sequences in the large-scale public databases

Bioinformatics ◽

10.1093/bioinformatics/btaa586 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4699-4705

Author(s):

Hamid Bagheri ◽

Andrew J Severin ◽

Hridesh Rajan

Keyword(s):

Large Scale ◽

Sequence Similarity ◽

Heuristic Method ◽

Simulated Data ◽

Supplementary Information ◽

Small Subset ◽

Taxonomic Assignment ◽

User Input ◽

Public Repositories ◽

Taxonomic Assignments

Abstract Motivation As the cost of sequencing decreases, the amount of data being deposited into public repositories is increasing rapidly. Public databases rely on the user to provide metadata for each submission that is prone to user error. Unfortunately, most public databases, such as non-redundant (NR), rely on user input and do not have methods for identifying errors in the provided metadata, leading to the potential for error propagation. Previous research on a small subset of the NR database analyzed misclassification based on sequence similarity. To the best of our knowledge, the amount of misclassification in the entire database has not been quantified. We propose a heuristic method to detect potentially misclassified taxonomic assignments in the NR database. We applied a curation technique and quality control to find the most probable taxonomic assignment. Our method incorporates provenance and frequency of each annotation from manually and computationally created databases and clustering information at 95% similarity. Results We found more than two million potentially taxonomically misclassified proteins in the NR database. Using simulated data, we show a high precision of 97% and a recall of 87% for detecting taxonomically misclassified proteins. The proposed approach and findings could also be applied to other databases. Availability and implementation Source code, dataset, documentation, Jupyter notebooks and Docker container are available at https://github.com/boalang/nr. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Summarizing the solution space in tumor phylogeny inference by multiple consensus trees

Bioinformatics ◽

10.1093/bioinformatics/btz312 ◽

2019 ◽

Vol 35 (14) ◽

pp. i408-i416 ◽

Cited By ~ 12

Author(s):

Nuraini Aguse ◽

Yuanyuan Qi ◽

Mohammed El-Kebir

Keyword(s):

Solution Space ◽

Simulated Data ◽

Exact Algorithm ◽

Real Data ◽

Supplementary Information ◽

Mixed Integer ◽

Consensus Tree ◽

Large Solution ◽

Consensus Trees ◽

Topological Features

Abstract Motivation Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees. Results We introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T. Availability and implementation https://github.com/elkebir-group/MCT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text