scholarly journals Reconstructing signaling pathways using regular language constrained paths

2019 ◽  
Vol 35 (14) ◽  
pp. i624-i633 ◽  
Author(s):  
Mitchell J Wagner ◽  
Aditya Pratapa ◽  
T M Murali

Abstract Motivation High-quality curation of the proteins and interactions in signaling pathways is slow and painstaking. As a result, many experimentally detected interactions are not annotated to any pathways. A natural question that arises is whether or not it is possible to automatically leverage existing pathway annotations to identify new interactions for inclusion in a given pathway. Results We present RegLinker, an algorithm that achieves this purpose by computing multiple short paths from pathway receptors to transcription factors within a background interaction network. The key idea underlying RegLinker is the use of regular language constraints to control the number of non-pathway interactions that are present in the computed paths. We systematically evaluate RegLinker and five alternative approaches against a comprehensive set of 15 signaling pathways and demonstrate that RegLinker recovers withheld pathway proteins and interactions with the best precision and recall. We used RegLinker to propose new extensions to the pathways. We discuss the literature that supports the inclusion of these proteins in the pathways. These results show the broad potential of automated analysis to attenuate difficulties of traditional manual inquiry. Availability and implementation https://github.com/Murali-group/RegLinker. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (13) ◽  
pp. 4097-4098 ◽  
Author(s):  
Anna Breit ◽  
Simon Ott ◽  
Asan Agibetov ◽  
Matthias Samwald

Abstract Summary Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. Availability and implementation Source code and data are openly available at https://github.com/OpenBioLink/OpenBioLink. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (18) ◽  
pp. 3489-3490 ◽  
Author(s):  
Diogo B Lima ◽  
André R F Silva ◽  
Mathieu Dupré ◽  
Marlon D M Santos ◽  
Milan A Clasen ◽  
...  

Abstract Motivation We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates. Results We demonstrate that a twofold rate increase for two E. coli top-down proteomics datasets may be achievable. Availability and implementation http://patternlabforproteomics.org/tdgc, freely available for academic use. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3645-3651
Author(s):  
Lyam Baudry ◽  
Gaël A Millot ◽  
Agnes Thierry ◽  
Romain Koszul ◽  
Vittore F Scolari

Abstract Motivation Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep sequencing. Differential analyses of these maps enable downstream biological interpretations. However, the multi-fractal nature of the chromatin polymer inside the cellular envelope results in contact frequency values spanning several orders of magnitude: contacts between loci pairs separated by large genomic distances are much sparser than closer pairs. The same is true for poorly covered regions, such as repeated sequences. Both distant and poorly covered regions translate into low signal-to-noise ratios. There is no clear consensus to address this limitation. Results We present Serpentine, a fast, flexible procedure operating on raw data, which considers the contacts in each region of a contact map. Binning is performed only when necessary on noisy regions, preserving informative ones. This results in high-quality, low-noise contact maps that can be conveniently visualized for rigorous comparative analyses. Availability and implementation Serpentine is available on the PyPI repository and https://github.com/koszullab/serpentine; documentation and tutorials are provided at https://serpentine.readthedocs.io/en/latest/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (8) ◽  
pp. 2632-2633 ◽  
Author(s):  
Francesco Ceccarelli ◽  
Denes Turei ◽  
Attila Gabor ◽  
Julio Saez-Rodriguez

Abstract Summary Multiple databases provide valuable information about curated pathways and other resources that can be used to build and analyze networks. OmniPath combines 61 (and continuously growing) network resources into a comprehensive collection, with over 120 000 interactions. We present here the OmniPath App, a Cytoscape plugin to flexibly import data from OmniPath via a simple and intuitive interface. Thus, it makes possible to directly access the large body of high-quality knowledge provided by OmniPath within Cytoscape for inspection and further use with other tools. Availability and implementation The OmniPath App has been developed for Cytoscape 3 in the Java programing language. The latest source code and the plugin can be found at: https://github.com/saezlab/Omnipath_Cytoscape and http://apps.cytoscape.org/apps/omnipath, respectively. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Pierre Morisse ◽  
Claire Lemaitre ◽  
Fabrice Legeai

Abstract Motivation Linked-Reads technologies combine both the high-quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C ++ API and toolkit which allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C ++, supported on Unix-based platforms, and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances


Author(s):  
Janina Reeder ◽  
Mo Huang ◽  
Joshua S Kaminker ◽  
Joseph N Paulson

Abstract Summary We developed the MicrobiomeExplorer R package to facilitate the analysis and visualization of microbial communities. The MicrobiomeExplorer R package allows a user to perform typical microbiome analytic workflows and visualize their results, either through the command line or an interactive Shiny application included with the package. In addition to applying common analytical workflows, the application enables automated analysis report generation. Availability and implementation Available at https://github.com/zoecastillo/microbiomeExplorer. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (3) ◽  
pp. 922-924 ◽  
Author(s):  
Oscar L Rodriguez ◽  
Anna Ritz ◽  
Andrew J Sharp ◽  
Ali Bashir

Abstract Summary While next-generation sequencing (NGS) has dramatically increased the availability of genomic data, phased genome assembly and structural variant (SV) analyses are limited by NGS read lengths. Long-read sequencing from Pacific Biosciences and NGS barcoding from 10x Genomics hold the potential for far more comprehensive views of individual genomes. Here, we present MsPAC, a tool that combines both technologies to partition reads, assemble haplotypes (via existing software) and convert assemblies into high-quality, phased SV predictions. MsPAC represents a framework for haplotype-resolved SV calls that moves one step closer to fully resolved, diploid genomes. Availability and implementation https://github.com/oscarlr/MsPAC. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Biharck M. Araújo ◽  
Aline L. Coelho ◽  
Sabrina A. Silveira ◽  
João P. R. Romanelli ◽  
Raquel C. de Melo-Minardi ◽  
...  

AbstractSummaryGAPIN is a web-based application for structural interaction network analysis among any type of PDB molecules, regardless of whether their interfaces are between chain-chain or chain-ligand. A special emphasis is given to graph clustering, allowing users to scrutinize target contexts for ligand candidates. We show how GAPIN can be used to unveil underlying hydrophobic patterns on a set of peptidase-inhibitor complexes. In another experiment, we show there is a positive correlation between cluster sizes and the presence of druggable spots, indicating that the clustering may discriminate the higher complexity of these hot subnetworks.Availability and implementationGAPIN is freely available as an easy-to-use web interface at https://[email protected], [email protected] informationSupplementary data are available online.


2016 ◽  
Author(s):  
Søeren M. Karst ◽  
Rasmus H. Kirkegaard ◽  
Mads Albertsen

ABSTRACTSummaryRecovery of population genomes is becoming a standard analysis in metagenomics and a multitude of different approaches exists. However, the workflows are complex, requiring data generation, binning, validation and finishing to generate high quality population genome bins. In addition, several different approaches are often used on the same dataset as the optimal strategy to extract a specific population genome varies. Here we introduce mmgenome: a toolbox for reproducible genome extraction from metagenomes. At the core of mmgenome is an R package that facilitates effortless integration of different binning strategies by collecting information on scaffolds. Genome binning is facilitated through integrated tools that support effortless visualizations, validation and calculation of key statistics. Full reproducibility and transparency is obtained through Rmarkdown, whereby every step can be recreated.Availability and implementationThe binning framework of mmge-nome is implemented in R. Wrapper scripts for data generation and finishing is written in Perl. The mmgenome toolbox and associated step-by-step guides are available at http://madsal-bertsen.github.io/mmgenome/[email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2082-2089 ◽  
Author(s):  
Tomasz M Kowalski ◽  
Szymon Grabowski

Abstract Motivation The amount of sequencing data from high-throughput sequencing technologies grows at a pace exceeding the one predicted by Moore’s law. One of the basic requirements is to efficiently store and transmit such huge collections of data. Despite significant interest in designing FASTQ compressors, they are still imperfect in terms of compression ratio or decompression resources. Results We present Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of building an approximation of the shortest common superstring over high-quality reads. Experiments show that PgRC wins in compression ratio over its main competitors, SPRING and Minicom, by up to 15 and 20% on average, respectively, while being comparably fast in decompression. Availability and implementation PgRC can be downloaded from https://github.com/kowallus/PgRC. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document