pyrpipe: a python package for RNA-Seq workflows

Implementing RNA-Seq analysis pipelines is challenging as data gets bigger and more complex. With the availability of terabytes of RNA-Seq data and continuous development of analysis tools, there is a pressing requirement for frameworks that allow for fast and efficient development, modification, sharing and reuse of workflows. Scripting is often used, but it has many challenges and drawbacks. We have developed a python package, python RNA-Seq Pipeliner (pyrpipe) that enables straightforward development of flexible, reproducible and easy-to-debug computational pipelines purely in python, in an object-oriented manner. pyrpipe provides high level APIs to popular RNA-Seq tools. Pipelines can be customized by integrating new python code, third-party programs, or python libraries. Researchers can create checkpoints in the pipeline or integrate pyrpipe into a workflow management system, thus allowing execution on multiple computing environments. pyrpipe produces detailed analysis, and benchmark reports which can be shared or included in publications. pyrpipe is implemented in python and is compatible with python versions 3.6 and higher. All source code is available at https://github.com/urmi-21/pyrpipe; the package can be installed from the source or from PyPi (https://pypi.org/project/pyrpipe). Documentation is available on Read the Docs (http://pyrpipe.rtfd.io).

Download Full-text

pyrpipe: a Python package for RNA-Seq workflows

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab049 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Urminder Singh ◽

Jing Li ◽

Arun Seetharam ◽

Eve Syrkin Wurtele

Keyword(s):

Workflow Management ◽

Third Party ◽

Rna Seq ◽

Rich Functionality ◽

Efficient Resource ◽

Biological Insight ◽

Computing Environments ◽

The Rich ◽

High Level ◽

Python Package

Abstract The availability of terabytes of RNA-Seq data and continuous emergence of new analysis tools, enable unprecedented biological insight. There is a pressing requirement for a framework that allows for fast, efficient, manageable, and reproducible RNA-Seq analysis. We have developed a Python package, (pyrpipe), that enables straightforward development of flexible, reproducible and easy-to-debug computational pipelines purely in Python, in an object-oriented manner. pyrpipe provides access to popular RNA-Seq tools, within Python, via high-level APIs. Pipelines can be customized by integrating new Python code, third-party programs, or Python libraries. Users can create checkpoints in the pipeline or integrate pyrpipe into a workflow management system, thus allowing execution on multiple computing environments, and enabling efficient resource management. pyrpipe produces detailed analysis, and benchmark reports which can be shared or included in publications. pyrpipe is implemented in Python and is compatible with Python versions 3.6 and higher. To illustrate the rich functionality of pyrpipe, we provide case studies using RNA-Seq data from GTEx, SARS-CoV-2-infected human cells, and Zea mays. All source code is freely available at https://github.com/urmi-21/pyrpipe; the package can be installed from the source, from PyPI (https://pypi.org/project/pyrpipe), or from bioconda (https://anaconda.org/bioconda/pyrpipe). Documentation is available at (http://pyrpipe.rtfd.io).

Download Full-text

LOICA: Logical Operators for Integrated Cell Algorithms

10.1101/2021.09.21.460548 ◽

2021 ◽

Author(s):

Gonzalo Vidal ◽

Carlos Vidal-Céspedes ◽

Timothy James Rudge

Keyword(s):

Design Automation ◽

Object Oriented ◽

Analysis Tool ◽

Genetic Circuits ◽

Logical Operators ◽

Object Oriented Design ◽

Design Build ◽

High Level ◽

Python Package ◽

Mathematical And Computational Modeling

Mathematical and computational modeling is essential to genetic design automation and for the synthetic biology design-build-test-learn cycle. The construction and analysis of models is enabled by abstraction based on a hierarchy of components, devices, and systems that can be used to compose genetic circuits. These abstract elements must be parameterized from data derived from relevant experiments, and these experiments related to the part composition of the abstract components of the circuits measured. Here we present LOICA (Logical Operators for Integrated Cell Algorithms), a Python package for modeling and characterizing genetic circuits based on a simple object-oriented design abstraction. LOICA uses classes to represent different biological and experimental components, which generate models through their interactions. High-level designs are linked to their part composition via SynBioHub. Furthermore, LOICA communicates with Flapjack, a data management and analysis tool, to link to experimental data, enabling abstracted elements to characterize themselves.

Download Full-text

Pembangunan Perangkat Lunak PuniEdit (Perangkat Lunak Editor untuk Multi Language Programming)

10.31227/osf.io/f4v25 ◽

2019 ◽

Author(s):

Budiman

Keyword(s):

Programming Languages ◽

Programming Language ◽

Source Code ◽

Computer Software ◽

Object Oriented ◽

Object Oriented Programming ◽

Text Editor ◽

Development Environment ◽

Integrated Development ◽

High Level

During this period continued to develop computer software, programming language was no exception. At the start of the era of low level programming languages, then developed a high level programming language. It is characterized by the appearance of a method of programming offered by a programming language, that is, object-oriented programming (OOP). IDE (Integrated Development Environment) is a computer program that has some facilities that are required in the development of the software. The purpose of the IDEA is to provide all the necessary utilities in building software. As for the type of software text editor that can be used to manipulate the source code hereinafter referred to as the source code of programming languages such as Ultraedit, JediEdit, ClearEdit, cEdit, the Golden Pen, and so on. PuniEdit software is a text-based editor software that can simplify the user through correction, insertion, and modification of the source code. PuniEdit software is built using Borland Delphi 7.0 and SynEdit component. This software can be used for the Pascal programming language, C++ and HTML. In addition, the software PuniEdit can perform management of the token. This PuniEdit software, the user can clearly see every occurrence of the type of token as keywords (reserved word), identifier, operator, and so on.Keywords: Source code, programming language, source code is scanned.

Download Full-text

Scientific workflows applied to the coupling of a continuum (Elmer v8.3) and a discrete element (HiDEM v1.0) ice dynamic model

Geoscientific Model Development ◽

10.5194/gmd-12-3001-2019 ◽

2019 ◽

Vol 12 (7) ◽

pp. 3001-3015 ◽

Cited By ~ 2

Author(s):

Shahbaz Memon ◽

Dorothée Vallot ◽

Thomas Zwinger ◽

Jan Åström ◽

Helmut Neukirchen ◽

...

Keyword(s):

Management System ◽

High Performance ◽

Heterogeneous Computing ◽

Workflow Management ◽

Scientific Workflow ◽

Workflow Management System ◽

Data Intensive ◽

Cpu Utilization ◽

Computing Environments ◽

High Level

Abstract. Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on computing resources often find it cumbersome to manage and monitor the execution of these tasks and their associated data. These workflow implementations usually add overhead by introducing unnecessary input/output (I/O) for coupling the models and can lead to sub-optimal CPU utilization. Furthermore, running these workflow implementations in different environments requires significant adaptation efforts, which can hinder the reproducibility of the underlying science. High-level scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows – even across distributed and heterogeneous computing environments. The WMS approach allows users to focus on the underlying high-level workflow and avoid low-level pitfalls that would lead to non-optimal resource usage while still allowing the workflow to remain portable between different computing environments. As a case study, we apply the UNICORE workflow management system to enable the coupling of a glacier flow model and calving model which contain many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, reusability, portability, and reproducibility in different environments and by different user groups. Last but not least, the workflow helps to speed the runs up by reducing model coupling I/O overhead and it optimizes CPU utilization by avoiding idle CPU cores and running the models in a distributed way on the HPC cluster that best fits the characteristics of each model.

Download Full-text

clinker & clustermap.js: Automatic generation of gene cluster comparison figures

10.1101/2020.11.08.370650 ◽

2020 ◽

Author(s):

Cameron L.M. Gilchrist ◽

Yit-Heng Chooi

Keyword(s):

Gene Cluster ◽

Evolutionary History ◽

Source Code ◽

Gene Clusters ◽

Automatic Generation ◽

Biological Pathways ◽

Link Type ◽

E Mail ◽

Python Package ◽

Publication Quality

AbstractSummaryGenes involved in biological pathways are often collocalised in gene clusters, the comparison of which can give valuable insights into their function and evolutionary history. However, comparison and visualisation of gene cluster homology is a tedious process, particularly when many clusters are being compared. Here, we present clinker, a Python based tool, and clustermap.js, a companion JavaScript visualisation library, which used together can automatically generate accurate, interactive, publication-quality gene cluster comparison figures directly from sequence files.Availability and ImplementationSource code and documentation for clinker and clustermap.js is available on GitHub (github.com/gamcil/clinker and github.com/gamcil/clustermap.js, respectively) under the MIT license. clinker can be installed directly from the Python Package Index via pip.ContactE-mail: [email protected], [email protected]

Download Full-text

Hypercluster: a python package and SnakeMake pipeline for flexible, parallelized unsupervised clustering optimization

10.1101/2020.01.13.905323 ◽

2020 ◽

Cited By ~ 1

Author(s):

Lili Blumenberg ◽

Kelly V. Ruggles

Keyword(s):

Big Data ◽

Single Cell ◽

High Throughput ◽

Unsupervised Clustering ◽

Multiple Models ◽

Rna Seq ◽

Link Type ◽

Hyperparameter Selection ◽

Clustering Optimization ◽

Python Package

AbstractUnsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. To streamline this process, we present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Hypercluster is available on bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster.Author summaryUnsupervised clustering is a technique for grouping similar samples within a dataset. It is extremely common when analyzing big data from patient samples, or high throughput techniques like single cell RNA-seq. When researchers use unsupervised clustering, they have to select parameters that affect the final result—for instance, how many groups they expect to find or what the smallest group is allowed to be. Some methods require setting even less intuitive parameters. For most applications, it is extremely challenging to guess what the values of these parameters should be; therefore to prevent introducing bias into the final results, researchers should test many different parameters and methods to find the best groups. This process is cumbersome, slow and challenging to perform in a reproducible way. We developed hypercluster, a tool that automates this process, make it much faster, and presenting the results in a reproducible and helpful manner.

Download Full-text

GTFtools: a Python package for analyzing various modes of gene models

10.1101/263517 ◽

2018 ◽

Cited By ~ 6

Author(s):

Hong-Dong Li

Keyword(s):

Intron Retention ◽

Gc Content ◽

Third Party ◽

Rna Seq ◽

Transcription Start ◽

Transcription Start Sites ◽

Different Types ◽

Gene Models ◽

Flanking Regions ◽

Python Package

AbstractSummaryGene-centric bioinformatics studies frequently involve calculation or extraction of various features of genes such as gene ID mapping, GC content calculation and different types of gene lengths, through manipulation of gene models that are often annotated in GTF format and available from ENSEMBL or GENCODE database. Such computation is essential for subsequent analysis such as intron retention detection where independent introns may need to be identified, converting RNA-seq read counts to FPKM where gene length is required, and obtaining flanking regions around transcription start sites. However, to our knowledge, a software package that is dedicated to analyzing various modes of gene models directly from GTF file is not publicly available. In this work, GTFtools (implemented in Python and not dependent on any non-python third-party software), a stand-alone command-line software that provides a set of functions to analyze various modes of gene models, is provided for facilitating routine bioinformatics studies where information about gene models needs to be calculated.AvailabilityGTFtools is freely available at www.genemine.org/[email protected].

Download Full-text

pyABC: distributed, likelihood-free inference

10.1101/162552 ◽

2017 ◽

Cited By ~ 1

Author(s):

Emmanuel Klinger ◽

Dennis Rickert ◽

Jan Hasenauer

Keyword(s):

Sequential Monte Carlo ◽

Source Code ◽

Distance Functions ◽

Web Interface ◽

Practical Application ◽

Acceptance Threshold ◽

Link Type ◽

Data Querying ◽

Approximate Bayesian ◽

Python Package

SummaryLikelihood-free methods are often required for inference in systems biology. While Approximate Bayesian Computation (ABC) provides a theoretical solution, its practical application has often been challenging due to its high computational demands. To scale likelihood-free inference to computationally demanding stochastic models we developed pyABC: a distributed and scalable ABC-Sequential Monte Carlo (ABC-SMC) framework. It implements computation-minimizing and scalable, runtime-minimizing parallelization strategies for multi-core and distributed environments scaling to thousands of cores. The framework is accessible to non-expert users and also enables advanced users to experiment with and to custom implement many options of ABC-SMC schemes, such as acceptance threshold schedules, transition kernels and distance functions without alteration of pyABC’s source code. pyABC includes a web interface to visualize ongoing and 1nished ABC-SMC runs and exposes an API for data querying and post-processing.Availability and ImplementationpyABC is written in Python 3 and is released under the GPLv3 license. The source code is hosted on https://github.com/neuralyzer/pyabc and the documentation on http://pyabc.readthedocs.io. It can be installed from the Python Package Index (PyPI).

Download Full-text

BOFdat: generating biomass objective function stoichiometric coefficients from experimental data

10.1101/243881 ◽

2018 ◽

Cited By ~ 4

Author(s):

Jean-Christophe Lachance ◽

Jonathan M. Monk ◽

Colton J. Lloyd ◽

Yara Seif ◽

Bernhard O. Palsson ◽

...

Keyword(s):

Experimental Data ◽

Objective Function ◽

Source Code ◽

Cell Composition ◽

Link Type ◽

Scale Models ◽

Relative Abundances ◽

Genome Scale ◽

Python Package

AbstractGenome-scale models (GEMs) rely on a biomass objective function (BOF) to predict phenotype from genotype. Here we present BOFdat, a Python package that offers functions to generate biomass objective function stoichiometric coefficients (BOFsc) from macromolecular cell composition and relative abundances of macromolecules obtained from omic datasets. Growth-associated and non-growth associated maintenance (GAM and NGAM) costs can also be calculated by BOFdat.BOFdat is freely available on the Python Package Index (pip install BOFdat). The source code and an example usage (Jupyter Notebook and example files) are available on GitHub (https://github.com/jclachance/BOFdat). The documentation and API are available through ReadTheDocs (https://bofdat.readthedocs.io)[email protected], [email protected], [email protected]

Download Full-text

Python Interfaces for the Smoldyn Simulator

10.1101/2020.12.15.422958 ◽

2020 ◽

Author(s):

Dilawar Singh ◽

Steven S. Andrews

Keyword(s):

Systems Biology ◽

Open Source ◽

Object Oriented ◽

Software Tools ◽

Supplementary Information ◽

Low Level ◽

Link Type ◽

Programming Interface ◽

Python Package

AbstractMotivationSmoldyn is a particle-based biochemical simulator that is frequently used for systems biology and biophysics research. Previously, users could only define models using text-based input or a C/C++ applicaton programming interface (API), which were convenient, but limited extensibility.ResultsWe added a Python API to Smoldyn to improve integration with other software tools such as Jupyter notebooks, other Python code libraries, and other simulators. It includes low-level functions that closely mimic the existing C/C++ API and higher-level functions that are more convenient to use. These latter functions follow modern object-oriented Python conventions.AvailabilitySmoldyn is open source and free, available athttp://www.smoldyn.org, and can be installed with the Python package managerpip. It runs on Mac, Windows, and [email protected] informationDocumentation is available athttp://www.smoldyn.organdhttps://smoldyn.readthedocs.io.

Download Full-text