pyABC: distributed, likelihood-free inference

SummaryLikelihood-free methods are often required for inference in systems biology. While Approximate Bayesian Computation (ABC) provides a theoretical solution, its practical application has often been challenging due to its high computational demands. To scale likelihood-free inference to computationally demanding stochastic models we developed pyABC: a distributed and scalable ABC-Sequential Monte Carlo (ABC-SMC) framework. It implements computation-minimizing and scalable, runtime-minimizing parallelization strategies for multi-core and distributed environments scaling to thousands of cores. The framework is accessible to non-expert users and also enables advanced users to experiment with and to custom implement many options of ABC-SMC schemes, such as acceptance threshold schedules, transition kernels and distance functions without alteration of pyABC’s source code. pyABC includes a web interface to visualize ongoing and 1nished ABC-SMC runs and exposes an API for data querying and post-processing.Availability and ImplementationpyABC is written in Python 3 and is released under the GPLv3 license. The source code is hosted on https://github.com/neuralyzer/pyabc and the documentation on http://pyabc.readthedocs.io. It can be installed from the Python Package Index (PyPI).

Download Full-text

A RESTful API to serve BAM file with OAuth2 compatible authorization

10.1101/151787 ◽

2017 ◽

Author(s):

Julien Delafontaine ◽

Sylvain Pradervand

Keyword(s):

Open Source ◽

Source Code ◽

Variant Calling ◽

Use Case ◽

Web Interface ◽

Sensitive Data ◽

Link Type ◽

Restful Service

AbstractSummaryBam-server is an open-source RESTful service to query slices of BAM files securely and manage their user accesses. A typical use case is the visualization of local read alignments in a web interface for variant calling diagnostic, without exposing sensitive data to unauthorized users through the network, and without moving the original - heavy - file. Bam-server follows the standard implementation of a protected resource server in the context of a typical token-based authorization protocol, supporting HMAC- and RSA-hashed signatures from an authorization server of choice.AvailabilityThe source code is available at https://github.com/chuv-ssrc/bam-server-scala, and a complete documentation can be found at http://bam-server-scala.readthedocs.io/en/latest/[email protected]

Download Full-text

clinker & clustermap.js: Automatic generation of gene cluster comparison figures

10.1101/2020.11.08.370650 ◽

2020 ◽

Author(s):

Cameron L.M. Gilchrist ◽

Yit-Heng Chooi

Keyword(s):

Gene Cluster ◽

Evolutionary History ◽

Source Code ◽

Gene Clusters ◽

Automatic Generation ◽

Biological Pathways ◽

Link Type ◽

E Mail ◽

Python Package ◽

Publication Quality

AbstractSummaryGenes involved in biological pathways are often collocalised in gene clusters, the comparison of which can give valuable insights into their function and evolutionary history. However, comparison and visualisation of gene cluster homology is a tedious process, particularly when many clusters are being compared. Here, we present clinker, a Python based tool, and clustermap.js, a companion JavaScript visualisation library, which used together can automatically generate accurate, interactive, publication-quality gene cluster comparison figures directly from sequence files.Availability and ImplementationSource code and documentation for clinker and clustermap.js is available on GitHub (github.com/gamcil/clinker and github.com/gamcil/clustermap.js, respectively) under the MIT license. clinker can be installed directly from the Python Package Index via pip.ContactE-mail: [email protected], [email protected]

Download Full-text

A decoupled, modular and scriptable architecture for tools to curate data platforms

10.1101/2020.09.28.282699 ◽

2020 ◽

Author(s):

Moritz Langenstein ◽

Henning Hermjakob ◽

Manuel Bernal Llinares

Keyword(s):

Web Application ◽

Production Systems ◽

Source Code ◽

Black Box ◽

Command Line ◽

Web Interface ◽

Link Type ◽

Data Platform ◽

The Web

AbstractMotivationCuration is essential for any data platform to maintain the quality of the data it provides. Existing databases, which require maintenance, and the amount of newly published information that needs to be surveyed, are growing rapidly. More efficient curation is often vital to keep up with this growth, requiring modern curation tools. However, curation interfaces are often complex and difficult to further develop. Furthermore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources, or a reluctance to change sensitive production systems.ResultsWe propose a decoupled, modular and scriptable architecture to build curation tools on top of existing platforms. Instead of modifying the existing infrastructure, our architecture treats the existing platform as a black box and relies only on its public APIs and web application. As a decoupled program, the tool’s architecture gives more freedom to developers and curators. This added flexibility allows for quickly prototyping new curation workflows as well as adding all kinds of analysis around the data platform. The tool can also streamline and enhance the curator’s interaction with the web interface of the platform. We have implemented this design in cmd-iaso, a command-line curation tool for the identifiers.org registry.AvailabilityThe cmd-iaso curation tool is implemented in Python 3.7+ and supports Linux, macOS and Windows. Its source code and documentation are freely available from https://github.com/identifiers-org/cmd-iaso. It is also published as a Docker container at https://hub.docker.com/r/identifiersorg/[email protected]

Download Full-text

pyrpipe: a python package for RNA-Seq workflows

10.1101/2020.03.04.925818 ◽

2020 ◽

Author(s):

Urminder Singh ◽

Jing Li ◽

Arun Seetharam ◽

Eve Syrkin Wurtele

Keyword(s):

Detailed Analysis ◽

Source Code ◽

Workflow Management ◽

Object Oriented ◽

Third Party ◽

Rna Seq ◽

Link Type ◽

Computing Environments ◽

High Level ◽

Python Package

Implementing RNA-Seq analysis pipelines is challenging as data gets bigger and more complex. With the availability of terabytes of RNA-Seq data and continuous development of analysis tools, there is a pressing requirement for frameworks that allow for fast and efficient development, modification, sharing and reuse of workflows. Scripting is often used, but it has many challenges and drawbacks. We have developed a python package, python RNA-Seq Pipeliner (pyrpipe) that enables straightforward development of flexible, reproducible and easy-to-debug computational pipelines purely in python, in an object-oriented manner. pyrpipe provides high level APIs to popular RNA-Seq tools. Pipelines can be customized by integrating new python code, third-party programs, or python libraries. Researchers can create checkpoints in the pipeline or integrate pyrpipe into a workflow management system, thus allowing execution on multiple computing environments. pyrpipe produces detailed analysis, and benchmark reports which can be shared or included in publications. pyrpipe is implemented in python and is compatible with python versions 3.6 and higher. All source code is available at https://github.com/urmi-21/pyrpipe; the package can be installed from the source or from PyPi (https://pypi.org/project/pyrpipe). Documentation is available on Read the Docs (http://pyrpipe.rtfd.io).

Download Full-text

BOFdat: generating biomass objective function stoichiometric coefficients from experimental data

10.1101/243881 ◽

2018 ◽

Cited By ~ 4

Author(s):

Jean-Christophe Lachance ◽

Jonathan M. Monk ◽

Colton J. Lloyd ◽

Yara Seif ◽

Bernhard O. Palsson ◽

...

Keyword(s):

Experimental Data ◽

Objective Function ◽

Source Code ◽

Cell Composition ◽

Link Type ◽

Scale Models ◽

Relative Abundances ◽

Genome Scale ◽

Python Package

AbstractGenome-scale models (GEMs) rely on a biomass objective function (BOF) to predict phenotype from genotype. Here we present BOFdat, a Python package that offers functions to generate biomass objective function stoichiometric coefficients (BOFsc) from macromolecular cell composition and relative abundances of macromolecules obtained from omic datasets. Growth-associated and non-growth associated maintenance (GAM and NGAM) costs can also be calculated by BOFdat.BOFdat is freely available on the Python Package Index (pip install BOFdat). The source code and an example usage (Jupyter Notebook and example files) are available on GitHub (https://github.com/jclachance/BOFdat). The documentation and API are available through ReadTheDocs (https://bofdat.readthedocs.io)[email protected], [email protected], [email protected]

Download Full-text

PyIOmica: Longitudinal Omics Analysis and Classification

10.1101/708941 ◽

2019 ◽

Author(s):

Sergii Domanskyi ◽

Carlo Piermarocchi ◽

George I. Mias

Keyword(s):

Time Series ◽

Gene Ontology ◽

Source Code ◽

Temporal Trends ◽

Enrichment Analysis ◽

Data Normalization ◽

Link Type ◽

Visibility Graphs ◽

Microsoft Windows ◽

Python Package

AbstractSummaryPyIOmica is an open-source Python package focusing on integrating longitudinal multiple omics datasets, characterizing, and classifying temporal trends. The package includes multiple bioinformatics tools including data normalization, annotation, classification, visualization, and enrichment analysis for gene ontology terms and pathways. Additionally, the package includes an implementation of visibility graphs to visualize time series as networks.Availability and implementationPyIOmica is implemented as a Python package (pyiomica), available for download and installation through the Python Package Index (PyPI) (https://pypi.python.org/pypi/pyiomica), and can be deployed using the Python import function following installation. PyIOmica has been tested on Mac OS X, Unix/Linux and Microsoft Windows. The application is distributed under an MIT license. Source code for each release is also available for download on Zenodo (https://doi.org/10.5281/zenodo.3342612)[email protected]

Download Full-text

Comparing two sequential Monte Carlo samplers for exact and approximate Bayesian inference on biological models

Journal of The Royal Society Interface ◽

10.1098/rsif.2017.0340 ◽

2017 ◽

Vol 14 (134) ◽

pp. 20170340 ◽

Cited By ~ 6

Author(s):

Aidan C. Daly ◽

Jonathan Cooper ◽

David J. Gavaghan ◽

Chris Holmes

Keyword(s):

Bayesian Inference ◽

Bayesian Methods ◽

Sequential Monte Carlo ◽

Model Parameters ◽

Biological Models ◽

Exact Inference ◽

Modelling Studies ◽

Approximate Bayesian ◽

Approximate Bayesian Inference ◽

Abc Methods

Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to various algorithmic parameters and model conditions. We conclude with a study of the O'Hara–Rudy cardiac action potential model to quantify the uncertainty amplification resulting from employing ABC using a set of clinically relevant biomarkers. We hope that this work serves to guide the implementation and comparative assessment of Bayesian and ABC sampling techniques in biological models.

Download Full-text

Revisiting the Out of Africa event with a novel Deep Learning approach

10.1101/2020.12.10.419069 ◽

2020 ◽

Author(s):

Francesco Montinaro ◽

Vasili Pankratov ◽

Burak Yelmen ◽

Luca Pagani ◽

Mayukh Mondal

Keyword(s):

Deep Learning ◽

Sequential Monte Carlo ◽

Whole Genome Sequence ◽

Learning Approach ◽

Whole Genome ◽

Modern Humans ◽

Anatomically Modern Humans ◽

Out Of Africa ◽

Approximate Bayesian ◽

Back To Africa

AbstractAnatomically modern humans evolved around 300 thousand years ago in Africa1. Modern humans started to appear in the fossil record outside of Africa about 100 thousand years ago though other hominins existed throughout Eurasia much earlier2–4. Recently, several researchers argued in favour of a single out of Africa event for modern humans based on whole-genome sequences analyses5–7. However, the single out of Africa model is in contrast with some of the findings from fossil records, which supports two out of Africa8,9, and uniparental data, which proposes back to Africa movement10,11. Here, we used a novel deep learning approach coupled with Approximate Bayesian Computation and Sequential Monte Carlo to revisit these hypotheses from the whole genome sequence perspective. Our results support the back to Africa model over other alternatives. We estimated that there are two successive splits between Africa and out of African populations happening around 60-80 thousand years ago and separated by 12-13 thousand years. One of the populations resulting from the more recent split has to a large extent replaced the older West African population while the other one has founded the out of Africa populations.

Download Full-text

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

Download Full-text

GalaxyCloudRunner: enhancing scalable computing for Galaxy

10.1101/2020.05.28.121772 ◽

2020 ◽

Author(s):

N Goonasekera ◽

A Mahmoud ◽

J Chilton ◽

E Afgan

Keyword(s):

Source Code ◽

Supplementary Information ◽

Scalable Computing ◽

Link Type ◽

Cloud Providers ◽

Galaxy Server ◽

Cloud Resources

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone

Download Full-text