ipyrad: Interactive assembly and analysis of RADseq datasets

Abstract Summary ipyrad is a free and open source tool for assembling and analyzing restriction site-associated DNA sequence datasets using de novo and/or reference-based approaches. It is designed to be massively scalable to hundreds of taxa and thousands of samples, and can be efficiently parallelized on high performance computing clusters. It is available both as a command line interface and as a Python package with an application programming interface, the latter of which can be used interactively to write complex, reproducible scripts and implement a suite of downstream analysis tools. Availability and implementation ipyrad is a free and open source program written in Python. Source code is available from the GitHub repository (https://github.com/dereneaton/ipyrad/), and Linux and MacOS installs are distributed through the conda package manager. Complete documentation, including numerous tutorials, and Jupyter notebooks demonstrating example assemblies and applications of downstream analysis tools are available online: https://ipyrad.readthedocs.io/.

Download Full-text

High Performance Computing - Power Application Programming Interface Specification.

10.2172/1494356 ◽

2016 ◽

Author(s):

Laros, James H., ◽

Suzanne M. Kelly ◽

Kevin Pedretti ◽

Ryan Grant ◽

Stephen Lecler Olivier ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Application Programming Interface ◽

Computing Power ◽

Interface Specification ◽

Application Programming ◽

Power Application ◽

Performance Computing ◽

Programming Interface

Download Full-text

HPCI: A Perl module for writing cluster-portable bioinformatics pipelines

10.1101/408666 ◽

2018 ◽

Author(s):

John M Macdonald ◽

Christopher M Lalansingh ◽

Christopher I Cooper ◽

Anqi Yang ◽

Felix Lam ◽

...

Keyword(s):

High Performance ◽

Application Programming Interface ◽

List Type ◽

Software Module ◽

Perl Module ◽

Software Interface ◽

Different Types ◽

Application Programming ◽

Computing Environments ◽

Performance Computing

AbstractBackgroundMost biocomputing pipelines are run on clusters of computers. Each type of cluster has its own API (application programming interface). That API defines how a program that is to run on the cluster must request the submission, content and monitoring of jobs to be run on the cluster. Sometimes, it is desirable to run the same pipeline on different types of cluster. This can happen in situations including when:different labs are collaborating, but they do not use the same type of clustera pipeline is released to other labs as open source or commercial softwarea lab has access to multiple types of cluster, and wants to choose between them for scaling, cost or other purposesa lab is migrating their infrastructure from one cluster type to anotherduring testing or travelling, it is often desired to run on a single computerHowever, since each type of cluster has its own API, code that runs jobs on one type of cluster needs to be re-written if it is desired to run that application on a different type of cluster. To resolve this problem, we created a software module to generalize the submission of pipelines across computing environments, including local compute, clouds and clusters.ResultsHPCI (High Performance Computing Interface) is a Perl module that provides the interface to a standardized generic cluster.When the HPCI module is used, it accepts a parameter to specify the cluster type. The HPCI module uses this to load a driver HPCD∷<cluster>. This is used to translate the abstract HPCI interface to the specific software interface.Simply by changing the cluster parameter, the same pipeline can be run on a different type of cluster with no other changes.ConclusionThe HPCI module assists in writing Perl programs that can be run in different lab environments, with different site configuration requirements and different types of hardware clusters. Rather than having to re-write portions of the program, it is only necessary to change a configuration file.Using HPCI, an application can manage collections of jobs to be runs, specify ordering dependencies, detect success or failure of jobs run and allow automatic retry of failed jobs (allowing for the possibility of a changed configuration such as when the original attempt specified an inadequate memory allotment).

Download Full-text

EFMlrs: a Python package for elementary flux mode enumeration via lexicographic reverse search

BMC Bioinformatics ◽

10.1186/s12859-021-04417-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bianca A Buchner ◽

Jürgen Zanghellini

Keyword(s):

High Performance Computing ◽

High Performance ◽

Metabolic Networks ◽

Stoichiometric Matrix ◽

Elementary Flux Mode ◽

Source Program ◽

Metabolic Models ◽

Performance Computing ◽

Python Package ◽

Flux Mode

Abstract Background Elementary flux mode (EFM) analysis is a well-established, yet computationally challenging approach to characterize metabolic networks. Standard algorithms require huge amounts of memory and lack scalability which limits their application to single servers and consequently limits a comprehensive analysis to medium-scale networks. Recently, Avis et al. developed —a parallel version of the lexicographic reverse search (lrs) algorithm, which, in principle, enables an EFM analysis on high-performance computing environments (Avis and Jordan. mplrs: a scalable parallel vertex/facet enumeration code. arXiv:1511.06487, 2017). Here we test its applicability for EFM enumeration. Results We developed , a Python package that gives users access to the enumeration capabilities of . uses COBRApy to process metabolic models from sbml files, performs loss-free compressions of the stoichiometric matrix, and generates suitable inputs for as well as , providing support not only for our proposed new method for EFM enumeration but also for already established tools. By leveraging COBRApy, also allows the application of additional reaction boundaries and seamlessly integrates into existing workflows. Conclusion We show that due to ’s properties, the algorithm is perfectly suited for high-performance computing (HPC) and thus offers new possibilities for the unbiased analysis of substantially larger metabolic models via EFM analyses. is an open-source program that comes together with a designated workflow and can be easily installed via pip.

Download Full-text

Effective Open-Source Performance Analysis Tools

Handbook of Research on Computational Science and Engineering - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-61350-116-0.ch005 ◽

2012 ◽

pp. 98-118

Author(s):

Prashobh Balasundaram

Keyword(s):

Performance Analysis ◽

Open Source ◽

High Performance ◽

Background Information ◽

End User ◽

Performance Bottlenecks ◽

Analysis Tools ◽

Critical Components ◽

Performance Analysis Tools ◽

Performance Computing

This chapter presents a study of leading open source performance analysis tools for high performance computing (HPC). The first section motivates the necessity of open source tools for performance analysis. Background information on performance analysis of computational software is presented discussing the various performance critical components of computers. Metrics useful for performance analysis of common performance bottleneck patterns observed in computational codes are enumerated and followed by an evaluation of open source tools useful for extracting these metrics. The tool’s features are analyzed from the perspective of an end user. Important factors are discussed, such as the portability of tuning applied after identification of performance bottlenecks, the hardware/software requirements of the tools, the need for additional metrics for novel hardware features, and identification of these new metrics and techniques for measuring them. This chapter focuses on open source tools since they are freely available to anyone at no cost.

Download Full-text