Automated inference of Boolean models from molecular interaction maps using CaSQ

Abstract Motivation Molecular interaction maps have emerged as a meaningful way of representing biological mechanisms in a comprehensive and systematic manner. However, their static nature provides limited insights to the emerging behaviour of the described biological system under different conditions. Computational modelling provides the means to study dynamic properties through in silico simulations and perturbations. We aim to bridge the gap between static and dynamic representations of biological systems with CaSQ, a software tool that infers Boolean rules based on the topology and semantics of molecular interaction maps built with CellDesigner. Results We developed CaSQ by defining conversion rules and logical formulas for inferred Boolean models according to the topology and the annotations of the starting molecular interaction maps. We used CaSQ to produce executable files of existing molecular maps that differ in size, complexity and the use of Systems Biology Graphical Notation (SBGN) standards. We also compared, where possible, the manually built logical models corresponding to a molecular map to the ones inferred by CaSQ. The tool is able to process large and complex maps built with CellDesigner (either following SBGN standards or not) and produce Boolean models in a standard output format, Systems Biology Marked Up Language-qualitative (SBML-qual), that can be further analyzed using popular modelling tools. References, annotations and layout of the CellDesigner molecular map are retained in the obtained model, facilitating interoperability and model reusability. Availability and implementation The present tool is available online: https://lifeware.inria.fr/∼soliman/post/casq/ and distributed as a Python package under the GNU GPLv3 license. The code can be accessed here: https://gitlab.inria.fr/soliman/casq. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ccNetViz: a WebGL-based JavaScript library for visualization of large networks

Bioinformatics ◽

10.1093/bioinformatics/btaa559 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4527-4529

Author(s):

Ales Saska ◽

David Tichy ◽

Robert Moore ◽

Achilles Rasquinha ◽

Caner Akdas ◽

...

Keyword(s):

Systems Biology ◽

Complex Networks ◽

Open Source ◽

High Speed ◽

A Priori ◽

Supplementary Information ◽

Network Visualization ◽

Supplementary Data ◽

Web Based ◽

Flow Of Information

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

Bioinformatics ◽

10.1093/bioinformatics/btaa045 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2690-2696

Author(s):

Jarkko Toivonen ◽

Pratyush K Das ◽

Jussi Taipale ◽

Esko Ukkonen

Keyword(s):

Markov Models ◽

Expectation Maximization Algorithm ◽

Software Tool ◽

Specific Weight ◽

Training Data ◽

Supplementary Information ◽

Markov Modeling ◽

Binding Motifs ◽

The Difference ◽

Probability Matrices

Abstract Motivation Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. Availability and implementation Software implementation is available from https://github.com/jttoivon/moder2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TriPOINT: a software tool to prioritize important genes in pathways and their non-coding regulators

Bioinformatics ◽

10.1093/bioinformatics/bty998 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2686-2689

Author(s):

Asa Thibodeau ◽

Dong-Guk Shin

Keyword(s):

Gene Expression ◽

Software Tool ◽

Supplementary Information ◽

Analysis Tool ◽

Graph Representations ◽

Expression Levels ◽

Conducting Pathway ◽

Pathway Analysis Tool ◽

Pathway Analyses ◽

Gene Expression Levels

Abstract Summary Current approaches for pathway analyses focus on representing gene expression levels on graph representations of pathways and conducting pathway enrichment among differentially expressed genes. However, gene expression levels by themselves do not reflect the overall picture as non-coding factors play an important role to regulate gene expression. To incorporate these non-coding factors into pathway analyses and to systematically prioritize genes in a pathway we introduce a new software: Triangulation of Perturbation Origins and Identification of Non-Coding Targets. Triangulation of Perturbation Origins and Identification of Non-Coding Targets is a pathway analysis tool, implemented in Java that identifies the significance of a gene under a condition (e.g. a disease phenotype) by studying graph representations of pathways, analyzing upstream and downstream gene interactions and integrating non-coding regions that may be regulating gene expression levels. Availability and implementation The TriPOINT open source software is freely available at https://github.uconn.edu/ajt06004/TriPOINT under the GPL v3.0 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

VisFeature: a stand-alone program for visualizing and analyzing statistical features of biological sequences

Bioinformatics ◽

10.1093/bioinformatics/btz689 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jun Wang ◽

Pu-Feng Du ◽

Xin-Yu Xue ◽

Guang-Ping Li ◽

Yuan-Ke Zhou ◽

...

Keyword(s):

Sequence Data ◽

Software Tool ◽

Data Retrieval ◽

Supplementary Information ◽

Statistical Features ◽

Biological Sequence ◽

Sequence Alignments ◽

Multiple Sequence ◽

Source Codes ◽

Multiple Sequence Alignments

Abstract Summary Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions. Availability and implementation VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases). Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SeqEditor: an application for primer design and sequence analysis with or without GTF/GFF files

Bioinformatics ◽

10.1093/bioinformatics/btaa903 ◽

2020 ◽

Author(s):

Ahmed Hafez ◽

Ricardo Futami ◽

Amir Arastehfar ◽

Farnaz Daneshnia ◽

Ana Miguel ◽

...

Keyword(s):

Protein Sequences ◽

Software Tool ◽

Primer Design ◽

Reference Sequence ◽

Supplementary Information ◽

Interactive Software ◽

Rna Sequences ◽

Content Mining ◽

Species Specific ◽

Flexible Application

Abstract Motivation Sequence analyses oriented to investigate specific features, patterns and functions of protein and DNA/RNA sequences usually require tools based on graphic interfaces whose main characteristic is their intuitiveness and interactivity with the user’s expertise, especially when curation or primer design tasks are required. However, interface-based tools usually pose certain computational limitations when managing large sequences or complex datasets, such as genome and transcriptome assemblies. Having these requirments in mind we have developed SeqEditor an interactive software tool for nucleotide and protein sequences’ analysis. Result SeqEditor is a cross-platform desktop application for the analysis of nucleotide and protein sequences. It is managed through a Graphical User Interface and can work either as a graphical sequence browser or as a fasta task manager for multi-fasta files. SeqEditor has been optimized for the management of large sequences, such as contigs, scaffolds or even chromosomes, and includes a GTF/GFF viewer to visualize and manage annotation files. In turn, this allows for content mining from reference genomes and transcriptomes with similar efficiency to that of command line tools. SeqEditor also incorporates a set of tools for singleplex and multiplex PCR primer design and pooling that uses a newly optimized and validated search strategy for target and species-specific primers. All these features make SeqEditor a flexible application that can be used to analyses complex sequences, design primers in PCR assays oriented for diagnosis, and/or manage, edit and personalize reference sequence datasets. Availabilityand implementation SeqEditor was developed in Java using Eclipse Rich Client Platform and is publicly available at https://gpro.biotechvana.com/download/SeqEditor as binaries for Windows, Linux and Mac OS. The user manual and tutorials are available online at https://gpro.biotechvana.com/tool/seqeditor/manual. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

cd2sbgnml: bidirectional conversion between CellDesigner and SBGN formats

Bioinformatics ◽

10.1093/bioinformatics/btz969 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2620-2622 ◽

Cited By ~ 3

Author(s):

Irina Balaur ◽

Ludovic Roy ◽

Alexander Mazein ◽

S Gökberk Karaca ◽

Ugur Dogrusoz ◽

...

Keyword(s):

Systems Biology ◽

Web Service ◽

Large Scale ◽

Supplementary Information ◽

Markup Language ◽

File Format ◽

Signalling Network ◽

Lesser General Public License ◽

Systems Biology Markup Language ◽

General Public License

Abstract Motivation CellDesigner is a well-established biological map editor used in many large-scale scientific efforts. However, the interoperability between the Systems Biology Graphical Notation (SBGN) Markup Language (SBGN-ML) and the CellDesigner’s proprietary Systems Biology Markup Language (SBML) extension formats remains a challenge due to the proprietary extensions used in CellDesigner files. Results We introduce a library named cd2sbgnml and an associated web service for bidirectional conversion between CellDesigner’s proprietary SBML extension and SBGN-ML formats. We discuss the functionality of the cd2sbgnml converter, which was successfully used for the translation of comprehensive large-scale diagrams such as the RECON Human Metabolic network and the complete Atlas of Cancer Signalling Network, from the CellDesigner file format into SBGN-ML. Availability and implementation The cd2sbgnml conversion library and the web service were developed in Java, and distributed under the GNU Lesser General Public License v3.0. The sources along with a set of examples are available on GitHub (https://github.com/sbgn/cd2sbgnml and https://github.com/sbgn/cd2sbgnml-webservice, respectively). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Jasmine: a Java pipeline for isomiR characterization in miRNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btz806 ◽

2019 ◽

Cited By ~ 2

Author(s):

Xiangfu Zhong ◽

Albert Pla ◽

Simon Rayner

Keyword(s):

Population Structure ◽

Software Tool ◽

Supplementary Information ◽

Supplementary Data ◽

Analysis Pipeline ◽

Detailed Characterization ◽

Fasta Format ◽

Java Application

Abstract Motivation The existence of complex subpopulations of miRNA isoforms, or isomiRs, is well established. While many tools exist for investigating isomiR populations, they differ in how they characterize an isomiR, making it difficult to compare results across different tools. Thus, there is a need for a more comprehensive and systematic standard for defining isomiRs. Such a standard would allow investigation of isomiR population structure in progressively more refined sub-populations, permitting the identification of more subtle changes between conditions and leading to an improved understanding of the processes that generate these differences. Results We developed Jasmine, a software tool that incorporates a hierarchal framework for characterizing isomiR populations. Jasmine is a Java application that can process raw read data in fastq/fasta format, or mapped reads in SAM format to produce a detailed characterization of isomiR populations. Thus, Jasmine can reveal structure not apparent in a standard miRNA-Seq analysis pipeline. Availability and implementation Jasmine is implemented in Java and R and freely available at bitbucket https://bitbucket.org/bipous/jasmine/src/master/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

geneCo: a visualized comparative genomic method to analyze multiple genome structures

Bioinformatics ◽

10.1093/bioinformatics/btz596 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5303-5305 ◽

Cited By ~ 4

Author(s):

Jaehee Jung ◽

Jong Im Kim ◽

Gangman Yi

Keyword(s):

Genome Structure ◽

Software Tool ◽

Detailed Comparison ◽

Supplementary Information ◽

Comparative Genomic ◽

Web Based ◽

Computational Environment ◽

Gene Comparison ◽

User Data ◽

Gain Loss

Abstract Summary In comparative and evolutionary genomics, a detailed comparison of common features between organisms is essential to evaluate genetic distance. However, identifying differences in matched and mismatched genes among multiple genomes is difficult using current comparative genomic approaches due to complicated methodologies or the generation of meager information from obtained results. This study describes a visualized software tool, geneCo (gene Comparison), for comparing genome structure and gene arrangements between various organisms. User data are aligned, gene information is recognized, and genome structures are compared based on user-defined GenBank files. Information regarding inversion, gain, loss, duplication and gene rearrangement among multiple organisms being compared is provided by geneCo, which uses a web-based interface that users can easily access without any need to consider the computational environment. Availability and implementation Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and the web-based user interface is built by PHP, HTML and CSS to support all browsers. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST)

Bioinformatics ◽

10.1093/bioinformatics/btaa622 ◽

2020 ◽

Cited By ~ 3

Author(s):

Vasundra Touré ◽

Steven Vercruysse ◽

Marcio Luis Acencio ◽

Ruth C Lovering ◽

Sandra Orchard ◽

...

Keyword(s):

Molecular Interaction ◽

Regulatory Networks ◽

Building Blocks ◽

Supplementary Information ◽

Biological Processes ◽

Causal Interaction ◽

End User ◽

Minimum Information ◽

Causal Statement ◽

In Cells

Abstract Motivation A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called ‘causal interaction’ takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. Results Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. Availability and implementation The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MolTrans: Molecular Interaction Transformer for drug–target interaction prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa880 ◽

2020 ◽

Author(s):

Kexin Huang ◽

Cao Xiao ◽

Lucas M Glass ◽

Jimeng Sun

Keyword(s):

Molecular Interaction ◽

Drug Target ◽

Pattern Mining ◽

Representation Learning ◽

Molecular Data ◽

Supplementary Information ◽

Biomedical Data ◽

Learning Approaches ◽

Real World Data ◽

Target Interaction

Abstract Motivation Drug–target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data. Results We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines. Availability and implementation The model scripts are available at https://github.com/kexinhuang12345/moltrans. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text