Finite element method completely implemented for graphic processor units using parallel algorithm libraries

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.

Download Full-text

Implementation of a Semi-Implicit Pressure-Based Multigrid Fluid Flow Algorithm on a Graphics Processing Unit

Volume 13: New Developments in Simulation Methods and Software for Engineering Applications; Safety Engineering, Risk Analysis and Reliability Methods; Transportation Systems ◽

10.1115/imece2009-11587 ◽

2009 ◽

Cited By ~ 5

Author(s):

Aaron F. Shinn ◽

S. P. Vanka

Keyword(s):

Stokes Equations ◽

Graphics Processing Unit ◽

Navier Stokes ◽

Processing Unit ◽

Navier Stokes Equations ◽

Driven Cavity ◽

Multigrid Algorithm ◽

Computational Speed ◽

Speed Up ◽

Graphics Processing

A semi-implicit pressure based multigrid algorithm for solving the incompressible Navier-Stokes equations was implemented on a Graphics Processing Unit (GPU) using CUDA (Compute Unified Device Architecture). The multigrid method employed was the Full Approximation Scheme (FAS), which is used for solving nonlinear equations. This algorithm is applied to the 2D driven cavity problem and compared to the CPU version of the code (written in Fortran) to assess computational speed-up.

Download Full-text

Prediction of Residual Stresses in a Multipass Pipe Weld by a Novel 3D Finite Element Approach

Volume 6B: Materials and Fabrication ◽

10.1115/pvp2018-85044 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hui Huang ◽

Jian Chen ◽

Blair Carlson ◽

Hui-Ping Wang ◽

Paul Crooker ◽

...

Keyword(s):

Finite Element ◽

Residual Stresses ◽

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Computational Cost ◽

Three Dimensional ◽

Processing Unit ◽

Girth Welds ◽

Welding Processes

Due to enormous computation cost, current residual stress simulation of multipass girth welds are mostly performed using two-dimensional (2D) axisymmetric models. The 2D model can only provide limited estimation on the residual stresses by assuming its axisymmetric distribution. In this study, a highly efficient thermal-mechanical finite element code for three dimensional (3D) model has been developed based on high performance Graphics Processing Unit (GPU) computers. Our code is further accelerated by considering the unique physics associated with welding processes that are characterized by steep temperature gradient and a moving arc heat source. It is capable of modeling large-scale welding problems that cannot be easily handled by the existing commercial simulation tools. To demonstrate the accuracy and efficiency, our code was compared with a commercial software by simulating a 3D multi-pass girth weld model with over 1 million elements. Our code achieved comparable solution accuracy with respect to the commercial one but with over 100 times saving on computational cost. Moreover, the three-dimensional analysis demonstrated more realistic stress distribution that is not axisymmetric in hoop direction.

Download Full-text

A three‐stage graphics processing unit‐based finite element analyses matrix generation strategy for unstructured meshes

International Journal for Numerical Methods in Engineering ◽

10.1002/nme.6383 ◽

2020 ◽

Vol 121 (17) ◽

pp. 3824-3848 ◽

Cited By ~ 1

Author(s):

Subhajit Sanfui ◽

Deepak Sharma

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Unstructured Meshes ◽

Processing Unit ◽

Finite Element Analyses ◽

Graphics Processing

Download Full-text

Software Polarization Spectrometer "PolariS"

Journal of Astronomical Instrumentation ◽

10.1142/s225117171450010x ◽

2014 ◽

Vol 03 (03n04) ◽

pp. 1450010 ◽

Cited By ~ 8

Author(s):

Izumi Mizuno ◽

Seiji Kameno ◽

Amane Kano ◽

Makoto Kuroo ◽

Fumitaka Nakamura ◽

...

Keyword(s):

Dynamic Range ◽

Graphics Processing Unit ◽

Zeeman Splitting ◽

High Spectral Resolution ◽

Processing Unit ◽

Analog To Digital ◽

Star Forming ◽

Device Architecture ◽

Graphics Processing ◽

High Degree

We have developed a software-based polarization spectrometer, PolariS, to acquire full-Stokes spectra with a very high spectral resolution of 61 Hz. The primary aim of PolariS is to measure the magnetic fields in dense star-forming cores by detecting the Zeeman splitting of molecular emission lines. The spectrometer consists of a commercially available digital sampler and a Linux computer. The computer is equipped with a graphics processing unit (GPU) to process FFT and cross-correlation using the Compute Unified Device Architecture (CUDA) library developed by NVIDIA. Thanks to a high degree of precision in quantization of the analog-to-digital converter and arithmetic in the GPU, PolariS offers excellent performances in linearity, dynamic range, sensitivity, bandpass flatness and stability. The software has been released under the MIT License and is available to the public. In this paper, we report the design of PolariS and its performance verified through engineering tests and commissioning observations.

Download Full-text

ACCELERATION OF FINITE ELEMENT COMPUTATION FOR SEISMIC WAVE PROPAGATION USING GRAPHICS PROCESSING UNIT

AIJ Journal of Technology and Design ◽

10.3130/aijt.19.1219 ◽

2013 ◽

Vol 19 (43) ◽

pp. 1219-1224

Author(s):

Kensuke WADA ◽

Shoichi NAKAI ◽

Toru SEKIGUCHI

Keyword(s):

Finite Element ◽

Wave Propagation ◽

Seismic Wave ◽

Graphics Processing Unit ◽

Seismic Wave Propagation ◽

Processing Unit ◽

Element Computation ◽

Graphics Processing ◽

Finite Element Computation

Download Full-text

Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

The Journal of Supercomputing ◽

10.1007/s11227-011-0672-7 ◽

2011 ◽

Vol 64 (3) ◽

pp. 942-967 ◽

Cited By ~ 44

Author(s):

Liheng Jian ◽

Cheng Wang ◽

Ying Liu ◽

Shenshen Liang ◽

Weidong Yi ◽

...

Keyword(s):

Data Mining ◽

Graphics Processing Unit ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Data Mining Techniques ◽

Device Architecture ◽

Parallel Data ◽

Parallel Data Mining ◽

Graphics Processing

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

Parallel computations of the step response of a floor heater with the use of a graphics processing unit. Part 2: results and their evaluation

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/bpasts-2013-0102 ◽

2013 ◽

Vol 61 (4) ◽

pp. 949-954 ◽

Cited By ~ 1

Author(s):

J. Gołębiowski ◽

J. Forenc

Keyword(s):

Graphics Processing Unit ◽

Sparse Matrix ◽

Temporal Distribution ◽

Step Response ◽

Processing Unit ◽

Commercial Program ◽

Speed Up ◽

Spatio Temporal ◽

Graphics Processing ◽

Linear Systems Of Equations

Abstract Using models and algorithms presented in the first part of the article, a spatio-temporal distribution of the step response of a floor heater was determined. The results have been presented in the form of heating curves and temperature profiles of the heater in the selected time moments. The computations results were verified through comparing them with the solution obtained with the use of a commercial program - NISA. Additionally, the distribution of the average time constant of thermal processes occurring in the heater was determined. The analysis of the use of a graphics processing unit in numerical computations based on the conjugate gradient method was done. It was proved that the use of a graphics processing unit is profitable in the case of solving linear systems of equations with dense coefficient matrices. In the case of a sparse matrix, the speed-up depends on the number of its non-zero elements.

Download Full-text

Fast τ-p transforms by chirp modulation

Geophysics ◽

10.1190/geo2018-0380.1 ◽

2019 ◽

Vol 84 (1) ◽

pp. A13-A17 ◽

Cited By ~ 1

Author(s):

Fredrik Andersson ◽

Johan Robertsson

Keyword(s):

Fourier Transform ◽

Computational Complexity ◽

Fourier Transforms ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Chirp Modulation ◽

Fourier Sums ◽

Graphics Processing ◽

Direct Implementation

We have developed simple, fast, and accurate algorithms for the linear Radon ([Formula: see text]-[Formula: see text]) transform and its inverse. The algorithms have an [Formula: see text] computational complexity in contrast to the [Formula: see text] cost of a direct implementation in 2D and an [Formula: see text] computational complexity compared to the [Formula: see text] cost of a direct implementation in 3D. The methods use Bluestein’s algorithm to evaluate discrete nonstandard Fourier sums, and they need, apart from the fast Fourier transform (FFT), only multiplication of chirp functions and their Fourier transforms. The computational cost and accuracy are thus reduced to that inherited by the FFT. Fully working algorithms can be implemented in a couple of lines of code. Moreover, we find that efficient graphics processing unit (GPU) implementations could achieve processing speeds of approximately [Formula: see text], implying that the algorithms are I/O bound rather than compute bound.

Download Full-text

Ultrasonic pulse propagation simulation using OpenCL for environment mapping and discovery

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019846290 ◽

2019 ◽

Vol 33 (5) ◽

pp. 1019-1029

Author(s):

Mohammad Y Al-Shorman ◽

Majd M Al-Kofahi

Keyword(s):

Experimental Data ◽

Pulse Propagation ◽

Graphics Processing Unit ◽

Ultrasonic Pulse ◽

Processing Unit ◽

Time Profiles ◽

Simulation Process ◽

Front End ◽

Speed Up ◽

Graphics Processing

A fast, highly parallelized, simulation of unidirectional ultrasonic pulse propagating in a two-dimensional environment is presented. The pulse intensity versus time is recorded using an array of unidirectional ultrasonic receivers located at known locations and arranged in a small circle around the transmitter. To speed up the simulation process, OpenCL 2.0 heterogeneous compute language on a graphics processing unit is used. The simulation result is then compared with experimental data to validate its accuracy. By comparing both simulated and experimental data, the collected intensity–time profiles can be used to map an environment. Environments can be mapped using not only direct reflections but also higher order reflections from objects that are not directly seen by the transmitter. With the help of this simulation, subtle characteristics in an environment, such as a slight tilt or curvature, can be measured. The front end of the simulation is written using C#, while the back end is written using C\C++ and OpenCL.

Download Full-text