Reconstructing signaling pathways using regular language constrained paths

Mitchell J Wagner; Aditya Pratapa; T M Murali

doi:10.1093/bioinformatics/btz360

Reconstructing signaling pathways using regular language constrained paths

Bioinformatics ◽

10.1093/bioinformatics/btz360 ◽

2019 ◽

Vol 35 (14) ◽

pp. i624-i633 ◽

Cited By ~ 3

Author(s):

Mitchell J Wagner ◽

Aditya Pratapa ◽

T M Murali

Keyword(s):

Signaling Pathways ◽

Regular Language ◽

Interaction Network ◽

Automated Analysis ◽

Supplementary Information ◽

Natural Question ◽

Supplementary Data ◽

High Quality ◽

Alternative Approaches ◽

New Interactions

Abstract Motivation High-quality curation of the proteins and interactions in signaling pathways is slow and painstaking. As a result, many experimentally detected interactions are not annotated to any pathways. A natural question that arises is whether or not it is possible to automatically leverage existing pathway annotations to identify new interactions for inclusion in a given pathway. Results We present RegLinker, an algorithm that achieves this purpose by computing multiple short paths from pathway receptors to transcription factors within a background interaction network. The key idea underlying RegLinker is the use of regular language constraints to control the number of non-pathway interactions that are present in the computed paths. We systematically evaluate RegLinker and five alternative approaches against a comprehensive set of 15 signaling pathways and demonstrate that RegLinker recovers withheld pathway proteins and interactions with the best precision and recall. We used RegLinker to propose new extensions to the pathways. We discuss the literature that supports the inclusion of these proteins in the pathways. These results show the broad potential of automated analysis to attenuate difficulties of traditional manual inquiry. Availability and implementation https://github.com/Murali-group/RegLinker. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

OpenBioLink: a benchmarking framework for large-scale biomedical link prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa274 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4097-4098 ◽

Cited By ~ 3

Author(s):

Anna Breit ◽

Simon Ott ◽

Asan Agibetov ◽

Matthias Samwald

Keyword(s):

Link Prediction ◽

Large Scale ◽

Source Code ◽

Machine Learning Algorithms ◽

Knowledge Networks ◽

Supplementary Information ◽

Supplementary Data ◽

Biomedical Knowledge ◽

High Quality ◽

Baseline Evaluation

Abstract Summary Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. Availability and implementation Source code and data are openly available at https://github.com/OpenBioLink/OpenBioLink. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Top-Down Garbage Collector: a tool for selecting high-quality top-down proteomics mass spectra

Bioinformatics ◽

10.1093/bioinformatics/btz085 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3489-3490 ◽

Cited By ~ 1

Author(s):

Diogo B Lima ◽

André R F Silva ◽

Mathieu Dupré ◽

Marlon D M Santos ◽

Milan A Clasen ◽

...

Keyword(s):

Quality Control ◽

Mass Spectra ◽

Rate Increase ◽

Supplementary Information ◽

Supplementary Data ◽

Top Down ◽

High Quality ◽

Garbage Collector ◽

E Coli ◽

Spectral Libraries

Abstract Motivation We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates. Results We demonstrate that a twofold rate increase for two E. coli top-down proteomics datasets may be achievable. Availability and implementation http://patternlabforproteomics.org/tdgc, freely available for academic use. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Serpentine: a flexible 2D binning method for differential Hi-C analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa249 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3645-3651

Author(s):

Lyam Baudry ◽

Gaël A Millot ◽

Agnes Thierry ◽

Romain Koszul ◽

Vittore F Scolari

Keyword(s):

Deep Sequencing ◽

Low Noise ◽

Supplementary Information ◽

Supplementary Data ◽

Fractal Nature ◽

Contact Map ◽

Signal To Noise ◽

High Quality ◽

Contact Maps ◽

Contact Frequency

Abstract Motivation Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep sequencing. Differential analyses of these maps enable downstream biological interpretations. However, the multi-fractal nature of the chromatin polymer inside the cellular envelope results in contact frequency values spanning several orders of magnitude: contacts between loci pairs separated by large genomic distances are much sparser than closer pairs. The same is true for poorly covered regions, such as repeated sequences. Both distant and poorly covered regions translate into low signal-to-noise ratios. There is no clear consensus to address this limitation. Results We present Serpentine, a fast, flexible procedure operating on raw data, which considers the contacts in each region of a contact map. Binning is performed only when necessary on noisy regions, preserving informative ones. This results in high-quality, low-noise contact maps that can be conveniently visualized for rigorous comparative analyses. Availability and implementation Serpentine is available on the PyPI repository and https://github.com/koszullab/serpentine; documentation and tutorials are provided at https://serpentine.readthedocs.io/en/latest/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Bringing data from curated pathway resources to Cytoscape with OmniPath

Bioinformatics ◽

10.1093/bioinformatics/btz968 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2632-2633 ◽

Cited By ~ 6

Author(s):

Francesco Ceccarelli ◽

Denes Turei ◽

Attila Gabor ◽

Julio Saez-Rodriguez

Keyword(s):

Source Code ◽

Large Body ◽

Supplementary Information ◽

Supplementary Data ◽

Network Resources ◽

High Quality ◽

Comprehensive Collection ◽

Intuitive Interface ◽

Growing Network

Abstract Summary Multiple databases provide valuable information about curated pathways and other resources that can be used to build and analyze networks. OmniPath combines 61 (and continuously growing) network resources into a comprehensive collection, with over 120 000 interactions. We present here the OmniPath App, a Cytoscape plugin to flexibly import data from OmniPath via a simple and intuitive interface. Thus, it makes possible to directly access the large body of high-quality knowledge provided by OmniPath within Cytoscape for inspection and further use with other tools. Availability and implementation The OmniPath App has been developed for Cytoscape 3 in the Java programing language. The latest source code and the plugin can be found at: https://github.com/saezlab/Omnipath_Cytoscape and http://apps.cytoscape.org/apps/omnipath, respectively. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LRez: C ++ API and toolkit for analyzing and managing Linked-Reads data

Bioinformatics Advances ◽

10.1093/bioadv/vbab022 ◽

2021 ◽

Author(s):

Pierre Morisse ◽

Claire Lemaitre ◽

Fabrice Legeai

Keyword(s):

Genome Assembly ◽

Low Cost ◽

Variant Calling ◽

Supplementary Information ◽

Supplementary Data ◽

High Quality ◽

Dna Molecule ◽

Sequencing Technologies ◽

Wide Range ◽

Genomic Regions

Abstract Motivation Linked-Reads technologies combine both the high-quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C ++ API and toolkit which allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C ++, supported on Unix-based platforms, and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances

Download Full-text

MicrobiomeExplorer: an R package for the analysis and visualization of microbial communities

Bioinformatics ◽

10.1093/bioinformatics/btaa838 ◽

2020 ◽

Author(s):

Janina Reeder ◽

Mo Huang ◽

Joshua S Kaminker ◽

Joseph N Paulson

Keyword(s):

Microbial Communities ◽

Automated Analysis ◽

R Package ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Report Generation ◽

Shiny Application

Abstract Summary We developed the MicrobiomeExplorer R package to facilitate the analysis and visualization of microbial communities. The MicrobiomeExplorer R package allows a user to perform typical microbiome analytic workflows and visualize their results, either through the command line or an interactive Shiny application included with the package. In addition to applying common analytical workflows, the application enables automated analysis report generation. Availability and implementation Available at https://github.com/zoecastillo/microbiomeExplorer. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MsPAC: a tool for haplotype-phased structural variant detection

Bioinformatics ◽

10.1093/bioinformatics/btz618 ◽

2019 ◽

Vol 36 (3) ◽

pp. 922-924 ◽

Cited By ~ 3

Author(s):

Oscar L Rodriguez ◽

Anna Ritz ◽

Andrew J Sharp ◽

Ali Bashir

Keyword(s):

Genomic Data ◽

Supplementary Information ◽

Supplementary Data ◽

High Quality ◽

Structural Variant ◽

Long Read ◽

One Step ◽

Variant Detection ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Summary While next-generation sequencing (NGS) has dramatically increased the availability of genomic data, phased genome assembly and structural variant (SV) analyses are limited by NGS read lengths. Long-read sequencing from Pacific Biosciences and NGS barcoding from 10x Genomics hold the potential for far more comprehensive views of individual genomes. Here, we present MsPAC, a tool that combines both technologies to partition reads, assemble haplotypes (via existing software) and convert assemblies into high-quality, phased SV predictions. MsPAC represents a framework for haplotype-resolved SV calls that moves one step closer to fully resolved, diploid genomes. Availability and implementation https://github.com/oscarlr/MsPAC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GAPIN: Grouped and Aligned Protein Interface Networks

10.1101/520833 ◽

2019 ◽

Author(s):

Biharck M. Araújo ◽

Aline L. Coelho ◽

Sabrina A. Silveira ◽

João P. R. Romanelli ◽

Raquel C. de Melo-Minardi ◽

...

Keyword(s):

Interaction Network ◽

Supplementary Information ◽

Protein Interface ◽

Supplementary Data ◽

Web Interface ◽

Web Based ◽

Structural Interaction ◽

Peptidase Inhibitor ◽

Supplementary Material ◽

Hydrophobic Patterns

AbstractSummaryGAPIN is a web-based application for structural interaction network analysis among any type of PDB molecules, regardless of whether their interfaces are between chain-chain or chain-ligand. A special emphasis is given to graph clustering, allowing users to scrutinize target contexts for ligand candidates. We show how GAPIN can be used to unveil underlying hydrophobic patterns on a set of peptidase-inhibitor complexes. In another experiment, we show there is a positive correlation between cluster sizes and the presence of druggable spots, indicating that the clustering may discriminate the higher complexity of these hot subnetworks.Availability and implementationGAPIN is freely available as an easy-to-use web interface at https://[email protected], [email protected] informationSupplementary data are available online.

Download Full-text

mmgenome: a toolbox for reproducible genome extraction from metagenomes

10.1101/059121 ◽

2016 ◽

Cited By ~ 42

Author(s):

Søeren M. Karst ◽

Rasmus H. Kirkegaard ◽

Mads Albertsen

Keyword(s):

Optimal Strategy ◽

R Package ◽

Supplementary Information ◽

Data Generation ◽

Supplementary Data ◽

High Quality ◽

Standard Analysis ◽

Specific Population ◽

The Core ◽

Supplementary Material

ABSTRACTSummaryRecovery of population genomes is becoming a standard analysis in metagenomics and a multitude of different approaches exists. However, the workflows are complex, requiring data generation, binning, validation and finishing to generate high quality population genome bins. In addition, several different approaches are often used on the same dataset as the optimal strategy to extract a specific population genome varies. Here we introduce mmgenome: a toolbox for reproducible genome extraction from metagenomes. At the core of mmgenome is an R package that facilitates effortless integration of different binning strategies by collecting information on scaffolds. Genome binning is facilitated through integrated tools that support effortless visualizations, validation and calculation of key statistics. Full reproducibility and transparency is obtained through Rmarkdown, whereby every step can be recreated.Availability and implementationThe binning framework of mmge-nome is implemented in R. Wrapper scripts for data generation and finishing is written in Perl. The mmgenome toolbox and associated step-by-step guides are available at http://madsal-bertsen.github.io/mmgenome/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

PgRC: pseudogenome-based read compressor

Bioinformatics ◽

10.1093/bioinformatics/btz919 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2082-2089 ◽

Cited By ~ 2

Author(s):

Tomasz M Kowalski ◽

Szymon Grabowski

Keyword(s):

Compression Ratio ◽

High Throughput Sequencing ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

High Quality ◽

Sequencing Technologies ◽

Significant Interest ◽

The One ◽

Shortest Common Superstring

Abstract Motivation The amount of sequencing data from high-throughput sequencing technologies grows at a pace exceeding the one predicted by Moore’s law. One of the basic requirements is to efficiently store and transmit such huge collections of data. Despite significant interest in designing FASTQ compressors, they are still imperfect in terms of compression ratio or decompression resources. Results We present Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of building an approximation of the shortest common superstring over high-quality reads. Experiments show that PgRC wins in compression ratio over its main competitors, SPRING and Minicom, by up to 15 and 20% on average, respectively, while being comparably fast in decompression. Availability and implementation PgRC can be downloaded from https://github.com/kowallus/PgRC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text