Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

Implementation of Membrane Algorithms on GPU

Journal of Applied Mathematics ◽

10.1155/2014/307617 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Xingyi Zhang ◽

Bangju Wang ◽

Zhuanlian Ding ◽

Jin Tang ◽

Juanjuan He

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Matching Problem ◽

Computing Device ◽

Central Processing ◽

New Class ◽

Intractable Problems ◽

Point Set ◽

Graphics Processing ◽

Gpu Implementation

Membrane algorithms are a new class of parallel algorithms, which attempt to incorporate some components of membrane computing models for designing efficient optimization algorithms, such as the structure of the models and the way of communication between cells. Although the importance of the parallelism of such algorithms has been well recognized, membrane algorithms were usually implemented on the serial computing device central processing unit (CPU), which makes the algorithms unable to work in an efficient way. In this work, we consider the implementation of membrane algorithms on the parallel computing device graphics processing unit (GPU). In such implementation, all cells of membrane algorithms can work simultaneously. Experimental results on two classical intractable problems, the point set matching problem and TSP, show that the GPU implementation of membrane algorithms is much more efficient than CPU implementation in terms of runtime, especially for solving problems with a high complexity.

Download Full-text

Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches

Journal of Applied Remote Sensing ◽

10.1117/1.3658028 ◽

2011 ◽

Vol 5 (1) ◽

pp. 051503 ◽

Cited By ~ 4

Author(s):

Jarno Mielikainen

Keyword(s):

Radiative Transfer ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Radiative Transfer Models ◽

Graphics Processing ◽

Transfer Models

Download Full-text

Ray-based modeling and imaging in viscoelastic media using graphics processing units

Geophysics ◽

10.1190/geo2018-0510.1 ◽

2019 ◽

Vol 84 (5) ◽

pp. S425-S436

Author(s):

Martin Sarajaervi ◽

Henk Keers

Keyword(s):

Seismic Data ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Central Processing ◽

Imaging Results ◽

Viscoelastic Modeling ◽

Graphics Processing ◽

Complex Valued

In seismic data processing, the amplitude loss caused by attenuation should be taken into account. The basis for this is provided by a 3D attenuation model described by the quality factor [Formula: see text], which is used in viscoelastic modeling and imaging. We have accomplished viscoelastic modeling and imaging using ray theory and the ray-Born approximation. This makes it possible to take [Formula: see text] into account using complex-valued and frequency-dependent traveltimes. We have developed a unified parallel implementation for modeling and imaging in the frequency domain and carried out the numerical integration on a graphics processing unit. A central part of the implementation is an efficient technique for computing large integrals. We applied the integration method to the 3D SEG/EAGE overthrust model to generate synthetic seismograms and imaging results. The attenuation effects are accurately modeled in the seismograms and compensated for in the imaging algorithm. The results indicate a significant improvement in computational efficiency compared to a parallel central processing unit baseline.

Download Full-text

GPU accelerated computation of fast spectral transforms

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee1103483g ◽

2011 ◽

Vol 24 (3) ◽

pp. 483-499

Author(s):

Dusan Gajic ◽

Radomir Stankovic

Keyword(s):

Graphics Processing Units ◽

Fast Algorithms ◽

Central Processing Unit ◽

Processing Unit ◽

Memory Transfer ◽

Simple Arithmetic ◽

Central Processing ◽

Graphics Processing ◽

Spectral Transforms ◽

Gpu Implementation

This paper discusses techniques for accelerated computation of several fast spectral transforms on graphics processing units (GPUs) using the Open Computing Language (OpenCL). We present a reformulation of fast algorithms which takes into account peculiar properties of transforms to make them suitable for the GPU implementation. A special attention is paid to the organization of computations, memory transfer reductions, impact of integer and Boolean arithmetic, different structure of algorithms, etc. Performance of the GPU implementations is compared with the classical C/C++ implementations for the central processing unit (CPU). Experiments confirm that, even though the spectral transforms considered involve only simple arithmetic, significant speedups are achieved by implementing the algorithms in OpenCL and performing them on the GPU.

Download Full-text

Efficient Prefix Scan for the GPU-Based Implementation of Random Forest

Advances in Social Networking and Online Communities - Handbook of Research on Interactive Information Quality in Expanding Social Network Communications ◽

10.4018/978-1-4666-7377-9.ch009 ◽

2015 ◽

pp. 140-151

Author(s):

Bojan Novak

Keyword(s):

Random Forest ◽

Graphics Processing Unit ◽

Processing Unit ◽

Random Forest Algorithm ◽

Central Processing ◽

Split Point ◽

Parallel Scan ◽

Graphics Processing ◽

Gpu Architecture ◽

Gpu Implementation

The random forest ensemble learning with the Graphics Processing Unit (GPU) version of prefix scan method is presented. The efficiency of the implementation of the random forest algorithm depends critically on the scan (prefix sum) algorithm. The prefix scan is used in the depth-first implementation of optimal split point computation. Described are different implementations of the prefix scan algorithms. The speeds of the algorithms depend on three factors: the algorithm itself, which could be improved, the programming skills, and the compiler. In parallel environments, things are even more complicated and depend on the programmer´s knowledge of the Central Processing Unit (CPU) or the GPU architecture. An efficient parallel scan algorithm that avoids bank conflicts is crucial for the prefix scan implementation. In our tests, multicore CPU and GPU implementation based on NVIDIA´s CUDA is compared.

Download Full-text

Numerical simulation of flattened heat pipe with double heat sources for CPU and GPU cooling application in laptop computers

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwaa091 ◽

2020 ◽

Author(s):

Wisoot Sanhan ◽

Kambiz Vafai ◽

Niti Kammuang-Lue ◽

Pradit Terdtoon ◽

Phrut Sakulchangsatjatai

Keyword(s):

Experimental Data ◽

Heat Pipe ◽

Graphics Processing Unit ◽

Processing Unit ◽

Heat Sources ◽

Final Thickness ◽

Laptop Computers ◽

Central Processing ◽

Graphics Processing ◽

Good Agreement

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Finite element method completely implemented for graphic processor units using parallel algorithm libraries

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694703 ◽

2017 ◽

Vol 33 (1) ◽

pp. 53-66 ◽

Cited By ~ 1

Author(s):

Franz Pichler ◽

Gundolf Haase

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Time Step ◽

Device Architecture ◽

Transient Problems ◽

Speed Up ◽

Automotive Batteries ◽

Graphics Processing

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.

Download Full-text

Graphics processing unit (GPU) implementation of image processing algorithms to improve system performance of the control acquisition, processing, and image display system (CAPIDS) of the micro-angiographic fluoroscope (MAF)

10.1117/12.911272 ◽

2012 ◽

Cited By ~ 1

Author(s):

S. N. Swetadri Vasan ◽

Ciprian N. Ionita ◽

A. H. Titus ◽

A. N. Cartwright ◽

D. R. Bednarek ◽

...

Keyword(s):

Image Processing ◽

System Performance ◽

Graphics Processing Unit ◽

Image Display ◽

Processing Unit ◽

Display System ◽

Image Processing Algorithms ◽

Processing Algorithms ◽

Graphics Processing ◽

Gpu Implementation

Download Full-text

PI-FLAME: A parallel immune system simulator using the FLAME graphic processing unit environment

SIMULATION ◽

10.1177/0037549716673724 ◽

2016 ◽

Vol 93 (1) ◽

pp. 69-84 ◽

Cited By ~ 6

Author(s):

Shailesh Tamrakar ◽

Paul Richmond ◽

Roshan M D’Souza

Keyword(s):

Immune System ◽

Graphics Processing Units ◽

Processing Unit ◽

Human Immune System ◽

Innate And Adaptive Immunity ◽

Agent Based ◽

Central Processing ◽

Agent Simulation ◽

Study Population ◽

Graphics Processing

Agent-based models (ABMs) are increasingly being used to study population dynamics in complex systems, such as the human immune system. Previously, Folcik et al. (The basic immune simulator: an agent-based model to study the interactions between innate and adaptive immunity. Theor Biol Med Model 2007; 4: 39) developed a Basic Immune Simulator (BIS) and implemented it using the Recursive Porous Agent Simulation Toolkit (RePast) ABM simulation framework. However, frameworks such as RePast are designed to execute serially on central processing units and therefore cannot efficiently handle large model sizes. In this paper, we report on our implementation of the BIS using FLAME GPU, a parallel computing ABM simulator designed to execute on graphics processing units. To benchmark our implementation, we simulate the response of the immune system to a viral infection of generic tissue cells. We compared our results with those obtained from the original RePast implementation for statistical accuracy. We observe that our implementation has a 13× performance advantage over the original RePast implementation.

Download Full-text