Process-oriented ecological modeling approach and scientific workflow system

Motivation Recent advances in genome sequencing and biological data analysis technologies used in bioinformatics have led to a fast and continuous increase in biological data. The difficulty of managing the huge amounts of data currently available to researchers and the need to have results within a reasonable time have led to the use of distributed and parallel computing infrastructures for their analysis. Recently, bioinformatics is exploring new approaches based on the use of hardware accelerators as GPUs. From an architectural perspective, GPUs are very different from traditional CPUs. Indeed, the latter are devices composed of few cores with lots of cache memory able to handle a few software threads at a time. Conversely, the former are devices equipped with hundreds of cores able to handle thousands of threads simultaneously, so that a very high level of parallelism can be reached. Use of GPUs over the last years has resulted in significant increases in the performance of certain applications. Despite GPUs are increasingly used in bioinformatics most laboratories do not have access to a GPU cluster or server. In this context, it is very important to provide useful services to use these tools. Methods A web-based platform has been implemented with the aim to enable researchers to perform their analysis through dedicated GPU-based computing resources. To this end, a GPU cluster equipped with 16 NVIDIA Tesla k20c cards has been configured. The infrastructure has been built upon the Galaxy technology [1]. Galaxy is an open web-based scientific workflow system for data intensive biomedical research accessible to researchers that do not have programming experience. Let us recall that Galaxy provides a public server, but it does not provide support to GPU-computing. By default, Galaxy is designed to run jobs on local systems. However, it can also be configured to run jobs on a cluster. The front-end Galaxy application runs on a single server, but tools are run on cluster nodes instead. To this end, Galaxy supports different distributed resource managers with the aim to enable different clusters. For the specific case, in our opinion SLURM [2] represents the most suitable workload manager to manage and control jobs. SLURM is a highly configurable workload and resource manager and it is currently used on six of the ten most powerful computers in the world including the Piz Daint, utilizing over 5000 NVIDIA Tesla K20 GPUs. Results GPU-based tools [3] devised by our group for quality control of NGS data have been used to test the infrastructure. Initially, this activity required to make changes to the tools with the aim to optimize the parallelization on the cluster according to the adopted workload manager. Successively, the tools have been converted into web-based services accessible through the Galaxy portal. Abstract truncated at 3,000 characters - the full version is available in the pdf file.

Download Full-text

Interaction between carbon dioxide emissions and eutrophication in a drinking water reservoir: A three-dimensional ecological modeling approach

The Science of The Total Environment ◽

10.1016/j.scitotenv.2019.01.336 ◽

2019 ◽

Vol 663 ◽

pp. 369-379 ◽

Cited By ~ 1

Author(s):

Zhonghan Chen ◽

Ping Huang ◽

Zhou Zhang

Keyword(s):

Carbon Dioxide ◽

Drinking Water ◽

Carbon Dioxide Emissions ◽

Three Dimensional ◽

Water Reservoir ◽

Ecological Modeling ◽

Modeling Approach ◽

Drinking Water Reservoir

Download Full-text

Enriching Agronomic Experiments with Data Provenance

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2017070102 ◽

2017 ◽

Vol 8 (3) ◽

pp. 21-38

Author(s):

Sergio Manuel Serra da Cruz ◽

Jose Antonio Pires do Nascimento

Keyword(s):

Systematic Error ◽

Statistical Data ◽

Scientific Workflow ◽

Computational Experiments ◽

Data Provenance ◽

Computational Approaches ◽

Integration Platform ◽

Workflow System ◽

Scientific Experiments ◽

Different Types

Reproducibility is a major feature of Science. Even agronomic research of exemplary quality may have irreproducible empirical findings because of random or systematic error. The ability to reproduce agronomic experiments based on statistical data and legacy scripts are not easily achieved. We propose RFlow, a tool that aid researchers to manage, share, and enact the scientific experiments that encapsulate legacy R scripts. RFlow transparently captures provenance of scripts and endows experiments reproducibility. Unlike existing computational approaches, RFlow is non-intrusive, does not require users to change their working way, it wraps agronomic experiments in a scientific workflow system. Our computational experiments show that the tool can collect different types of provenance metadata of real experiments and enrich agronomic data with provenance metadata. This study shows the potential of RFlow to serve as the primary integration platform for legacy R scripts, with implications for other data- and compute-intensive agronomic projects.

Download Full-text