FiCoS: a fine- and coarse-grained GPU-powered deterministic simulator for biochemical networks

AbstractMathematical models of biochemical networks can largely facilitate the comprehension of the mechanisms at the basis of cellular processes, as well as the formulation of hypotheses that can then be tested with targeted laboratory experiments. However, two issues might hamper the achievement of fruitful outcomes. On the one hand, detailed mechanistic models can involve hundreds or thousands of molecular species and their intermediate complexes, as well as hundreds or thousands of chemical reactions, a situation generally occurring when rule-based models are analysed. On the other hand, the computational analysis of a model typically requires the execution of a large number of simulations for its calibration or to test the effect of perturbations. As a consequence, the computational capabilities of modern Central Processing Units can be easily overtaken, possibly making the modeling of biochemical networks a worthless or ineffective effort. To the aim of overcoming the limitations of the current state-of-the-art simulation approaches, we present in this paper FiCoS, a novel “black-box” deterministic simulator that effectively realizes both a fine- and a coarse-grained parallelization on Graphics Processing Units. In particular, FiCoS exploits two different integration methods, namely the Dormand–Prince and the Radau IIA, to efficiently solve both non-stiff and stiff systems of coupled Ordinary Differential Equations. We tested the performance of FiCoS against different deterministic simulators, by considering models of increasing size and by running analyses with increasing computational demands. FiCoS was able to dramatically speedup the computations up to 855 ×, showing to be a promising solution for the simulation and analysis of large-scale models of complex biological processes.Author summarySystems Biology is an interdisciplinary research area focusing on the integration of biology and in-silico simulation of mathematical models to unravel and predict the emergent behavior of complex biological systems. The ultimate goal is the understanding of the complex mechanisms at the basis of biological processes together with the formulation of novel hypotheses that can be then tested with laboratory experiments. In such a context, detailed mechanistic models can be used to describe biological networks. Unfortunately, these models can be characterized by hundreds or thousands of molecular species and chemical reactions, making their simulation unfeasible using classic simulators running on modern Central Processing Units (CPUs). In addition, a large number of simulations might be required to calibrate the models or to test the effect of perturbations. In order to overcome the limitations imposed by CPUs, Graphics Processing Units (GPUs) can be effectively used to accelerate the simulations of these detailed models. We thus designed and developed a novel GPU-based tool, called FiCoS, to speed-up the computational analyses typically required in Systems Biology.

Download Full-text

FiCoS: A fine-grained and coarse-grained GPU-powered deterministic simulator for biochemical networks

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009410 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009410

Author(s):

Andrea Tangherloni ◽

Marco S. Nobile ◽

Paolo Cazzaniga ◽

Giulia Capitoli ◽

Simone Spolaor ◽

...

Keyword(s):

Graphics Processing Units ◽

Large Scale ◽

Biochemical Networks ◽

Coarse Grained ◽

Stiff Systems ◽

Fine Grained ◽

Central Processing ◽

Cellular Processes ◽

The One ◽

Graphics Processing

Mathematical models of biochemical networks can largely facilitate the comprehension of the mechanisms at the basis of cellular processes, as well as the formulation of hypotheses that can be tested by means of targeted laboratory experiments. However, two issues might hamper the achievement of fruitful outcomes. On the one hand, detailed mechanistic models can involve hundreds or thousands of molecular species and their intermediate complexes, as well as hundreds or thousands of chemical reactions, a situation generally occurring in rule-based modeling. On the other hand, the computational analysis of a model typically requires the execution of a large number of simulations for its calibration or to test the effect of perturbations. As a consequence, the computational capabilities of modern Central Processing Units can be easily overtaken, possibly making the modeling of biochemical networks a worthless or ineffective effort. To the aim of overcoming the limitations of the current state-of-the-art simulation approaches, we present in this paper FiCoS, a novel “black-box” deterministic simulator that effectively realizes both a fine-grained and a coarse-grained parallelization on Graphics Processing Units. In particular, FiCoS exploits two different integration methods, namely, the Dormand–Prince and the Radau IIA, to efficiently solve both non-stiff and stiff systems of coupled Ordinary Differential Equations. We tested the performance of FiCoS against different deterministic simulators, by considering models of increasing size and by running analyses with increasing computational demands. FiCoS was able to dramatically speedup the computations up to 855×, showing to be a promising solution for the simulation and analysis of large-scale models of complex biological processes.

Download Full-text

How proteins open fusion pores: insights from molecular simulations

European Biophysics Journal ◽

10.1007/s00249-020-01484-3 ◽

2020 ◽

Author(s):

H. Jelger Risselada ◽

Helmut Grubmüller

Keyword(s):

Free Energy ◽

Fusion Proteins ◽

Graphics Processing Units ◽

Fusion Reaction ◽

Molecular Simulations ◽

Minimum Free Energy ◽

Free Energy Calculations ◽

Coarse Grained ◽

Fusion Pore ◽

Graphics Processing

AbstractFusion proteins can play a versatile and involved role during all stages of the fusion reaction. Their roles go far beyond forcing the opposing membranes into close proximity to drive stalk formation and fusion. Molecular simulations have played a central role in providing a molecular understanding of how fusion proteins actively overcome the free energy barriers of the fusion reaction up to the expansion of the fusion pore. Unexpectedly, molecular simulations have revealed a preference of the biological fusion reaction to proceed through asymmetric pathways resulting in the formation of, e.g., a stalk-hole complex, rim-pore, or vertex pore. Force-field based molecular simulations are now able to directly resolve the minimum free-energy path in protein-mediated fusion as well as quantifying the free energies of formed reaction intermediates. Ongoing developments in Graphics Processing Units (GPUs), free energy calculations, and coarse-grained force-fields will soon gain additional insights into the diverse roles of fusion proteins.

Download Full-text

Self-assembly of coarse-grained ionic surfactants accelerated by graphics processing units

Soft Matter ◽

10.1039/c1sm06787g ◽

2012 ◽

Vol 8 (8) ◽

pp. 2385-2397 ◽

Cited By ~ 87

Author(s):

David N. LeBard ◽

Benjamin G. Levine ◽

Philipp Mertmann ◽

Stephen A. Barr ◽

Arben Jusufi ◽

...

Keyword(s):

Self Assembly ◽

Graphics Processing Units ◽

Coarse Grained ◽

Ionic Surfactants ◽

Graphics Processing

Download Full-text

Heterogenous Computing on Iris Matching with OpenCL

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.850.129 ◽

2016 ◽

Vol 850 ◽

pp. 129-135

Author(s):

Buğra Şimşek ◽

Nursel Akçam

Keyword(s):

Graphics Processing Units ◽

Iris Recognition ◽

Heterogeneous Computing ◽

Hamming Distance ◽

Heterogeneous Systems ◽

Digital Signal ◽

Mobile Platforms ◽

Central Processing ◽

Field Programmable ◽

Graphics Processing

This study presents parallelization of Hamming Distance algorithm, which is used for iris comparison on iris recognition systems, for heterogeneous systems that can be included Central Processing Units (CPUs), Graphics Processing Units (GPUs), Digital Signal Processing (DSP) boards, Field Programmable Gate Array (FPGA) and some other mobile platforms with OpenCL. OpenCL allows to run same code on CPUs, GPUs, FPGAs and DSP boards. Heterogeneous computing refers to systems include different kind of devices (CPUs, GPUs, FPGAs and other accelerators). Heterogeneous computing gains performance or reduces power for suitable algorithms on these OpenCL supported devices. In this study, Hamming Distance algorithm has been coded with C++ as a sequential code and has been parallelized a designated method by us with OpenCL. Our OpenCL code has been executed on Nvidia GT430 GPU and Intel Xeon 5650 processor. The OpenCL code implementation demonstrates that speed up to 87 times with parallelization. Also our study differs from other studies, which accelerate iris matching, with regard to ensure heterogeneous computing by using OpenCL.

Download Full-text

The VOLNA-OP2 tsunami code (version 1.5)

Geoscientific Model Development ◽

10.5194/gmd-11-4621-2018 ◽

2018 ◽

Vol 11 (11) ◽

pp. 4621-4635 ◽

Cited By ~ 7

Author(s):

Istvan Z. Reguly ◽

Daniel Giles ◽

Devaraj Gopinathan ◽

Laure Quivy ◽

Joakim H. Beck ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Shallow Water Equation ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Central Processing ◽

Domain Specific ◽

Computing Platforms ◽

Graphics Processing ◽

Intel Xeon

Abstract. In this paper, we present the VOLNA-OP2 tsunami model and implementation; a finite-volume non-linear shallow-water equation (NSWE) solver built on the OP2 domain-specific language (DSL) for unstructured mesh computations. VOLNA-OP2 is unique among tsunami solvers in its support for several high-performance computing platforms: central processing units (CPUs), the Intel Xeon Phi, and graphics processing units (GPUs). This is achieved in a way that the scientific code is kept separate from various parallel implementations, enabling easy maintainability. It has already been used in production for several years; here we discuss how it can be integrated into various workflows, such as a statistical emulator. The scalability of the code is demonstrated on three supercomputers, built with classical Xeon CPUs, the Intel Xeon Phi, and NVIDIA P100 GPUs. VOLNA-OP2 shows an ability to deliver productivity as well as performance and portability to its users across a number of platforms.

Download Full-text

Graphics processing units in bioinformatics, computational biology and systems biology

Briefings in Bioinformatics ◽

10.1093/bib/bbw058 ◽

2016 ◽

pp. bbw058 ◽

Cited By ~ 20

Author(s):

Marco S. Nobile ◽

Paolo Cazzaniga ◽

Andrea Tangherloni ◽

Daniela Besozzi

Keyword(s):

Systems Biology ◽

Computational Biology ◽

Graphics Processing Units ◽

Graphics Processing

Download Full-text

TOWARDS FFT-BASED DIRECT NUMERICAL SIMULATIONS OF TURBULENT FLOWS ON A GPU

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962313500141 ◽

2013 ◽

Vol 05 (01) ◽

pp. 1350014 ◽

Cited By ~ 1

Author(s):

CATHERINE RUCKI ◽

ABHILASH J. CHANDY

Keyword(s):

Turbulent Flows ◽

Graphics Processing Units ◽

Turbulence Models ◽

Science And Engineering ◽

Computing Power ◽

Central Processing ◽

General Improvement ◽

Highly Turbulent Flows ◽

Graphics Processing ◽

Pseudo Spectral

The accurate simulation of turbulence and the implementation of corresponding turbulence models are both critical to the understanding of the complex physics behind turbulent flows in a variety of science and engineering applications. Despite the tremendous increase in the computing power of central processing units (CPUs), direct numerical simulation of highly turbulent flows is still not feasible due to the need for resolving the smallest length scale, and today's CPUs cannot keep pace with demand. The recent development of graphics processing units (GPU) has led to the general improvement in the performance of various algorithms. This study investigates the applicability of GPU technology in the context of fast-Fourier transform (FFT)-based pseudo-spectral methods for DNS of turbulent flows for the Taylor–Green vortex problem. They are implemented on a single GPU and a speedup of unto 31x is obtained in comparison to a single CPU.

Download Full-text

PI-FLAME: A parallel immune system simulator using the FLAME graphic processing unit environment

SIMULATION ◽

10.1177/0037549716673724 ◽

2016 ◽

Vol 93 (1) ◽

pp. 69-84 ◽

Cited By ~ 6

Author(s):

Shailesh Tamrakar ◽

Paul Richmond ◽

Roshan M D’Souza

Keyword(s):

Immune System ◽

Graphics Processing Units ◽

Processing Unit ◽

Human Immune System ◽

Innate And Adaptive Immunity ◽

Agent Based ◽

Central Processing ◽

Agent Simulation ◽

Study Population ◽

Graphics Processing

Agent-based models (ABMs) are increasingly being used to study population dynamics in complex systems, such as the human immune system. Previously, Folcik et al. (The basic immune simulator: an agent-based model to study the interactions between innate and adaptive immunity. Theor Biol Med Model 2007; 4: 39) developed a Basic Immune Simulator (BIS) and implemented it using the Recursive Porous Agent Simulation Toolkit (RePast) ABM simulation framework. However, frameworks such as RePast are designed to execute serially on central processing units and therefore cannot efficiently handle large model sizes. In this paper, we report on our implementation of the BIS using FLAME GPU, a parallel computing ABM simulator designed to execute on graphics processing units. To benchmark our implementation, we simulate the response of the immune system to a viral infection of generic tissue cells. We compared our results with those obtained from the original RePast implementation for statistical accuracy. We observe that our implementation has a 13× performance advantage over the original RePast implementation.

Download Full-text

An Accelerated 3D Navier–Stokes Solver for Flows in Turbomachines

Journal of Turbomachinery ◽

10.1115/1.4001192 ◽

2010 ◽

Vol 133 (2) ◽

Cited By ~ 43

Author(s):

Tobias Brandvik ◽

Graham Pullan

Keyword(s):

Graphics Processing Units ◽

Three Dimensional ◽

Navier Stokes ◽

Linear Scaling ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Order Of Magnitude ◽

Graphics Processing ◽

Good Agreement

A new three-dimensional Navier–Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes but has been implemented to run on graphics processing units (GPUs) instead of the traditional central processing unit. The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. The scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 min on a cluster with four GPUs.

Download Full-text

Controllers: An abstraction to ease the use of hardware accelerators

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017702962 ◽

2017 ◽

Vol 32 (6) ◽

pp. 838-853 ◽

Cited By ~ 4

Author(s):

Ana Moreton–Fernandez ◽

Hector Ortega–Arranz ◽

Arturo Gonzalez–Escribano

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Abstract Entity ◽

Hardware Accelerators ◽

Processing Unit ◽

Central Processing ◽

Computing Platforms ◽

Graphics Processing ◽

Performance Computing ◽

Selection Of

Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.

Download Full-text