Employing graphics processing unit technology, alternating direction implicit method and domain decomposition to speed up the numerical diffusion solver for the biomedical engineering research

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.

Download Full-text

Implementation of a Semi-Implicit Pressure-Based Multigrid Fluid Flow Algorithm on a Graphics Processing Unit

Volume 13: New Developments in Simulation Methods and Software for Engineering Applications; Safety Engineering, Risk Analysis and Reliability Methods; Transportation Systems ◽

10.1115/imece2009-11587 ◽

2009 ◽

Cited By ~ 5

Author(s):

Aaron F. Shinn ◽

S. P. Vanka

Keyword(s):

Stokes Equations ◽

Graphics Processing Unit ◽

Navier Stokes ◽

Processing Unit ◽

Navier Stokes Equations ◽

Driven Cavity ◽

Multigrid Algorithm ◽

Computational Speed ◽

Speed Up ◽

Graphics Processing

A semi-implicit pressure based multigrid algorithm for solving the incompressible Navier-Stokes equations was implemented on a Graphics Processing Unit (GPU) using CUDA (Compute Unified Device Architecture). The multigrid method employed was the Full Approximation Scheme (FAS), which is used for solving nonlinear equations. This algorithm is applied to the 2D driven cavity problem and compared to the CPU version of the code (written in Fortran) to assess computational speed-up.

Download Full-text

Parallel computations of the step response of a floor heater with the use of a graphics processing unit. Part 2: results and their evaluation

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/bpasts-2013-0102 ◽

2013 ◽

Vol 61 (4) ◽

pp. 949-954 ◽

Cited By ~ 1

Author(s):

J. Gołębiowski ◽

J. Forenc

Keyword(s):

Graphics Processing Unit ◽

Sparse Matrix ◽

Temporal Distribution ◽

Step Response ◽

Processing Unit ◽

Commercial Program ◽

Speed Up ◽

Spatio Temporal ◽

Graphics Processing ◽

Linear Systems Of Equations

Abstract Using models and algorithms presented in the first part of the article, a spatio-temporal distribution of the step response of a floor heater was determined. The results have been presented in the form of heating curves and temperature profiles of the heater in the selected time moments. The computations results were verified through comparing them with the solution obtained with the use of a commercial program - NISA. Additionally, the distribution of the average time constant of thermal processes occurring in the heater was determined. The analysis of the use of a graphics processing unit in numerical computations based on the conjugate gradient method was done. It was proved that the use of a graphics processing unit is profitable in the case of solving linear systems of equations with dense coefficient matrices. In the case of a sparse matrix, the speed-up depends on the number of its non-zero elements.

Download Full-text

Ultrasonic pulse propagation simulation using OpenCL for environment mapping and discovery

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019846290 ◽

2019 ◽

Vol 33 (5) ◽

pp. 1019-1029

Author(s):

Mohammad Y Al-Shorman ◽

Majd M Al-Kofahi

Keyword(s):

Experimental Data ◽

Pulse Propagation ◽

Graphics Processing Unit ◽

Ultrasonic Pulse ◽

Processing Unit ◽

Time Profiles ◽

Simulation Process ◽

Front End ◽

Speed Up ◽

Graphics Processing

A fast, highly parallelized, simulation of unidirectional ultrasonic pulse propagating in a two-dimensional environment is presented. The pulse intensity versus time is recorded using an array of unidirectional ultrasonic receivers located at known locations and arranged in a small circle around the transmitter. To speed up the simulation process, OpenCL 2.0 heterogeneous compute language on a graphics processing unit is used. The simulation result is then compared with experimental data to validate its accuracy. By comparing both simulated and experimental data, the collected intensity–time profiles can be used to map an environment. Environments can be mapped using not only direct reflections but also higher order reflections from objects that are not directly seen by the transmitter. With the help of this simulation, subtle characteristics in an environment, such as a slight tilt or curvature, can be measured. The front end of the simulation is written using C#, while the back end is written using C\C++ and OpenCL.

Download Full-text

Speed up big integer multiplication in the Block Wiedemann on graphics processing unit

Design, Manufacturing and Mechatronics ◽

10.1142/9789813208322_0005 ◽

2017 ◽

Author(s):

Peng-Bo Wu ◽

Jing-Fei Jiang ◽

Yang Zhao

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Speed Up ◽

Integer Multiplication ◽

Graphics Processing

Download Full-text

Construction of an optoacoustic image of biological tissues based on an algorithm for a graphics processor

Applied Physics ◽

10.51368/1996-0948-2021-5-106-109 ◽

2021 ◽

pp. 106-109

Author(s):

Denis Kravchuk

Keyword(s):

Gpu Computing ◽

Graphics Processing Unit ◽

Biological Tissues ◽

Ultrasonic Field ◽

Processing Unit ◽

Optoacoustic Imaging ◽

Optoacoustic Interaction ◽

Speed Up ◽

Migration Method ◽

Graphics Processing

The use of optical contrast between different blood particles allows the use of optoacoustic imaging to visualize the distribution of blood particles (erythrocytes, taking into account oxygen saturation), the delivery of drugs to organs through blood vessels. An algorithm for calculating the ultrasonic field obtained as a result of optoacoustic interaction has been developed to speed up calculations on the GPU board. An architecture for fast restoration of an optoacoustic signal based on graphics processing unit (GPU) programming is proposed. The algorithm used in combination with the pre-migration method provides an improvement in the resolution and sharpness of the optoacoustic image of the simulated biological tissues. Thanks to the advanced graphics processing unit (GPU) computing architecture, time-consuming main processing unit (CPU) computing is accelerated with great computational efficiency.

Download Full-text

Parallel Reservoir Simulation with OpenACC and Domain Decomposition

Algorithms ◽

10.3390/a11120213 ◽

2018 ◽

Vol 11 (12) ◽

pp. 213 ◽

Cited By ~ 1

Author(s):

Zhijiang Kang ◽

Ze Deng ◽

Wei Han ◽

Dongmei Zhang

Keyword(s):

Domain Decomposition ◽

Reservoir Simulation ◽

Graphics Processing Unit ◽

Domain Decomposition Method ◽

Simulation Method ◽

Processing Unit ◽

Reservoir Simulations ◽

Device Architecture ◽

Important Approach ◽

Graphics Processing

Parallel reservoir simulation is an important approach to solving real-time reservoir management problems. Recently, there is a new trend of using a graphics processing unit (GPU) to parallelize the reservoir simulations. Current GPU-aided reservoir simulations focus on compute unified device architecture (CUDA). Nevertheless, CUDA is not functionally portable across devices and incurs high amount of code. Meanwhile, domain decomposition is not well used for GPU-based reservoir simulations. In order to address the problems, we propose a parallel method with OpenACC to accelerate serial code and reduce the time and effort during porting an application to GPU. Furthermore, the GPU-aided domain decomposition is developed to accelerate the efficiency of reservoir simulation. The experimental results indicate that (1) the proposed GPU-aided approach can outperform the CPU-based one up to about two times, meanwhile with the help of OpenACC, the workload of the transplant code was reduced significantly by about 22 percent of the source code, (2) the domain decomposition method can further improve the execution efficiency up to 1.7×. The proposed parallel reservoir simulation method is a efficient tool to accelerate reservoir simulation.

Download Full-text

Analysis of GPU Computation of Parabolic, Bessel, Wright and Riemann Zeta Functions

ITM Web of Conferences ◽

10.1051/itmconf/20214002005 ◽

2021 ◽

Vol 40 ◽

pp. 02005

Author(s):

Ashish A. Jadhav ◽

Abhijeet D. Kalamkar ◽

Pritish A. Gaikwad ◽

Vishwesh Vyawahare ◽

Navin Singhaniya

Keyword(s):

Fractional Calculus ◽

Gpu Computing ◽

Graphics Processing Unit ◽

Zeta Functions ◽

Computation Time ◽

Processing Unit ◽

Mathematical Functions ◽

Speed Up ◽

Sequential Code ◽

Graphics Processing

This paper deals with GPU computing of special mathematical functions that are used in Fractional Calculus. The graphics processing unit (GPU) has grown to be an integral part of nowadays’s mainstream computing structures. The special mathematical functions are an integral part of Fractional Calculus. This paper deals with a novel parallel approach for computing special mathematical functions used in Fractional Calculus. NVIDIA’s GPU hardware is used to speed up the parallel algorithm. A comparison of the sequential code, vectorized code and GPU code is performed. We have successfully reduced the computation time of special mathematical functions using the parallel computing capabilities of GPU.

Download Full-text

AMIDE v2: High-Throughput Screening Based on AutoDock-GPU and Improved Workflow Leading to Better Performance and Reliability

International Journal of Molecular Sciences ◽

10.3390/ijms22147489 ◽

2021 ◽

Vol 22 (14) ◽

pp. 7489

Author(s):

Pierre Darme ◽

Manuel Dauchez ◽

Arnaud Renard ◽

Laurence Voutquenne-Nazabadioko ◽

Dominique Aubert ◽

...

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

High Performance ◽

Graphics Processing Unit ◽

Target Identification ◽

Computation Time ◽

Processing Unit ◽

Biological Target ◽

Speed Up ◽

Graphics Processing

Molecular docking is widely used in computed drug discovery and biological target identification, but getting fast results can be tedious and often requires supercomputing solutions. AMIDE stands for AutoMated Inverse Docking Engine. It was initially developed in 2014 to perform inverse docking on High Performance Computing. AMIDE version 2 brings substantial speed-up improvement by using AutoDock-GPU and by pulling a total revision of programming workflow, leading to better performances, easier use, bug corrections, parallelization improvements and PC/HPC compatibility. In addition to inverse docking, AMIDE is now an optimized tool capable of high throughput inverse screening. For instance, AMIDE version 2 allows acceleration of the docking up to 12.4 times for 100 runs of AutoDock compared to version 1, without significant changes in docking poses. The reverse docking of a ligand on 87 proteins takes only 23 min on 1 GPU (Graphics Processing Unit), while version 1 required 300 cores to reach the same execution time. Moreover, we have shown an exponential acceleration of the computation time as a function of the number of GPUs used, allowing a significant reduction of the duration of the inverse docking process on large datasets.

Download Full-text

Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit

Royal Society Open Science ◽

10.1098/rsos.170436 ◽

2017 ◽

Vol 4 (9) ◽

pp. 170436 ◽

Cited By ~ 9

Author(s):

Gang Mei ◽

Liangliang Xu ◽

Nengxiong Xu

Keyword(s):

Shared Memory ◽

Graphics Processing Unit ◽

Inverse Distance Weighting ◽

Processing Unit ◽

Double Precision ◽

Distance Weighting ◽

Speed Up ◽

Graphics Processing ◽

Data Layouts ◽

Inverse Distance

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.

Download Full-text