scholarly journals Application of the Characteristic Basis Function Method Using CUDA

2014 ◽  
Vol 2014 ◽  
pp. 1-13
Author(s):  
Juan Ignacio Pérez ◽  
Eliseo García ◽  
José A. de Frutos ◽  
Felipe Cátedra

The characteristic basis function method (CBFM) is a popular technique for efficiently solving the method of moments (MoM) matrix equations. In this work, we address the adaptation of this method to a relatively new computing infrastructure provided by NVIDIA, the Compute Unified Device Architecture (CUDA), and take into account some of the limitations which appear when the geometry under analysis becomes too big to fit into the Graphics Processing Unit’s (GPU’s) memory.

2014 ◽  
Vol 667 ◽  
pp. 345-348
Author(s):  
Jie Liu ◽  
Wei Lai Li ◽  
Jian Jun Pan ◽  
Zhong Kuan Chen

To obtain wideband radar cross-section (RCS) frequency response of multiple perfectly electric conducting (PEC) objects, the frequency sweeping by reusing the ultra-wide band characteristic basis functions (UCBFs) is applied. This method, based on the Characteristic Basis Function Method (CBFM), maintains all the benefit of CBFM, especially accelerating the solution of matrix equations generated by the method of moments (MoM) applied to the scatting problems in electromagnetics. Compared with conventional CBFM procedure, reusing the UCBFs without repeating the calculations of them at different frequency points leads to a significant reduction of computational time. Generating UCBFs for highest frequency, reusing UCBFs for lower frequencies and constructing reduced matrix for each frequency are the three keys of this method. Numerical results demonstrated the efficiency of this method.


2015 ◽  
Vol 8 (9) ◽  
pp. 2815-2827 ◽  
Author(s):  
S. Xu ◽  
X. Huang ◽  
L.-Y. Oey ◽  
F. Xu ◽  
H. Fu ◽  
...  

Abstract. Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.


2009 ◽  
Vol 19 (04) ◽  
pp. 513-533 ◽  
Author(s):  
FUMIHIKO INO ◽  
YUKI KOTANI ◽  
YUMA MUNEKAWA ◽  
KENICHI HAGIHARA

This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications. In a dedicated environment, the GPU-accelerated code achieves five times higher throughput than the CPU-based code. Furthermore, a linear speedup of 30.7X is observed on a 32-node cluster of dedicated GPUs. We also implement a compute unified device architecture (CUDA) based algorithm to demonstrate further acceleration.


2017 ◽  
Vol 14 (1) ◽  
pp. 789-795
Author(s):  
V Saveetha ◽  
S Sophia

Parallel data clustering aims at using algorithms and methods to extract knowledge from fat databases in rational time using high performance architectures. The computational challenge faced by cluster analysis due to increasing capacity of data can be overcome by exploiting the power of these architectures. The recent development in parallel power of Graphics Processing Unit enables low cost high performance solutions for general purpose applications. The Compute Unified Device Architecture programming model provides application programming interface methods to handle data proficiently on Graphics Processing Unit for iterative clustering algorithms like K-Means. The existing Graphics Processing Unit based K-Means algorithms highly focus on improvising the speedup of the algorithms and fall short to handle the high time spent on transfer of data between the Central Processing Unit and Graphics Processing Unit. A competent K-Means algorithm is proposed in this paper to lessen the transfer time by introducing a novel approach to check the convergence of the algorithm and utilize the pinned memory for direct access. This algorithm outperforms the other algorithms by maximizing parallelism and utilizing the memory features. The relative speedups and the validity measure for the proposed algorithm is elevated when compared with K-Means on Graphics Processing Unit and K-Means using Flag on Graphics Processing Unit. Thus the planned approach proves that communication overhead can be reduced in K-Means clustering.


Sign in / Sign up

Export Citation Format

Share Document