Application of the Characteristic Basis Function Method Using CUDA

The characteristic basis function method (CBFM) is a popular technique for efficiently solving the method of moments (MoM) matrix equations. In this work, we address the adaptation of this method to a relatively new computing infrastructure provided by NVIDIA, the Compute Unified Device Architecture (CUDA), and take into account some of the limitations which appear when the geometry under analysis becomes too big to fit into the Graphics Processing Unit’s (GPU’s) memory.

Download Full-text

Fast Frequency Sweep for Multiple PEC Objects RCS Computation Based on the Characteristic Basis Function Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.667.345 ◽

2014 ◽

Vol 667 ◽

pp. 345-348

Author(s):

Jie Liu ◽

Wei Lai Li ◽

Jian Jun Pan ◽

Zhong Kuan Chen

Keyword(s):

Method Of Moments ◽

Basis Function ◽

Function Method ◽

Wide Band ◽

Basis Functions ◽

Computational Time ◽

Matrix Equations ◽

Electric Conducting ◽

Band Characteristic ◽

Wideband Radar

To obtain wideband radar cross-section (RCS) frequency response of multiple perfectly electric conducting (PEC) objects, the frequency sweeping by reusing the ultra-wide band characteristic basis functions (UCBFs) is applied. This method, based on the Characteristic Basis Function Method (CBFM), maintains all the benefit of CBFM, especially accelerating the solution of matrix equations generated by the method of moments (MoM) applied to the scatting problems in electromagnetics. Compared with conventional CBFM procedure, reusing the UCBFs without repeating the calculations of them at different frequency points leads to a significant reduction of computational time. Generating UCBFs for highest frequency, reusing UCBFs for lower frequencies and constructing reduced matrix for each frequency are the three keys of this method. Numerical results demonstrated the efficiency of this method.

Download Full-text

Characteristic basis function method: A new technique for efficient solution of method of moments matrix equations

Microwave and Optical Technology Letters ◽

10.1002/mop.10685 ◽

2002 ◽

Vol 36 (2) ◽

pp. 95-100 ◽

Cited By ~ 386

Author(s):

V. V. S. Prakash ◽

Raj Mittra

Keyword(s):

Method Of Moments ◽

Basis Function ◽

Function Method ◽

Efficient Solution ◽

New Technique ◽

Matrix Equations ◽

A New Technique

Download Full-text

CHARACTERISTIC BASIS FUNCTION METHOD FOR ITERATION-FREE SOLUTION OF LARGE METHOD OF MOMENTS PROBLEMS

Progress In Electromagnetics Research B ◽

10.2528/pierb08031206 ◽

2008 ◽

Vol 6 ◽

pp. 307-336 ◽

Cited By ~ 49

Author(s):

Raj Mittra ◽

Kai Du

Keyword(s):

Method Of Moments ◽

Basis Function ◽

Function Method ◽

Free Solution

Download Full-text

Rough layers scattering filled by elliptical cylindersfrom the method of moments combined with thecharacteristic basis function method and the Kirchoffapproximation

Journal of the Optical Society of America A ◽

10.1364/josaa.430309 ◽

2021 ◽

Author(s):

Christophe Bourlier

Keyword(s):

Method Of Moments ◽

Basis Function ◽

Function Method

Download Full-text

Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

The Journal of Supercomputing ◽

10.1007/s11227-011-0672-7 ◽

2011 ◽

Vol 64 (3) ◽

pp. 942-967 ◽

Cited By ~ 44

Author(s):

Liheng Jian ◽

Cheng Wang ◽

Ying Liu ◽

Shenshen Liang ◽

Weidong Yi ◽

...

Keyword(s):

Data Mining ◽

Graphics Processing Unit ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Data Mining Techniques ◽

Device Architecture ◽

Parallel Data ◽

Parallel Data Mining ◽

Graphics Processing

Download Full-text

POM.gpu-v1.0: a GPU-based Princeton Ocean Model

Geoscientific Model Development ◽

10.5194/gmd-8-2815-2015 ◽

2015 ◽

Vol 8 (9) ◽

pp. 2815-2827 ◽

Cited By ~ 13

Author(s):

S. Xu ◽

X. Huang ◽

L.-Y. Oey ◽

F. Xu ◽

H. Fu ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Climate Models ◽

Ocean Model ◽

Compute Unified Device Architecture ◽

Princeton Ocean Model ◽

Central Processing ◽

Device Architecture ◽

Computationally Intensive ◽

Graphics Processing

Abstract. Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.

Download Full-text

Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA)

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) ◽

10.1109/sbac-pad.2007.21 ◽

2007 ◽

Cited By ~ 11

Author(s):

Gustavo Poli ◽

Joao F. Mari ◽

Jose Hiroki Saito ◽

Alexandre L. M. Levada

Keyword(s):

Graphics Processing Units ◽

Dynamic Time Warping ◽

Compute Unified Device Architecture ◽

Time Warping ◽

Voice Command ◽

Device Architecture ◽

Dynamic Time ◽

Graphics Processing

Download Full-text

Low-coherence interferometry with polynomial interpolation on Compute Unified Device Architecture-enabled graphics processing units

Optical Engineering ◽

10.1117/1.oe.52.9.094105 ◽

2013 ◽

Vol 52 (9) ◽

pp. 094105 ◽

Cited By ~ 8

Author(s):

Slawomir Tomczewski ◽

Anna Pakula ◽

Jürgen Van Erps ◽

Hugo Thienpont ◽

Leszek Salbut

Keyword(s):

Graphics Processing Units ◽

Polynomial Interpolation ◽

Compute Unified Device Architecture ◽

Low Coherence ◽

Device Architecture ◽

Low Coherence Interferometry ◽

Graphics Processing

Download Full-text

HARNESSING THE POWER OF IDLE GPUS FOR ACCELERATION OF BIOLOGICAL SEQUENCE ALIGNMENT

Parallel Processing Letters ◽

10.1142/s0129626409000390 ◽

2009 ◽

Vol 19 (04) ◽

pp. 513-533 ◽

Cited By ~ 7

Author(s):

FUMIHIKO INO ◽

YUKI KOTANI ◽

YUMA MUNEKAWA ◽

KENICHI HAGIHARA

Keyword(s):

Sequence Alignment ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Grid System ◽

Biological Sequence ◽

Device Architecture ◽

Linear Speedup ◽

Graphics Processing

This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications. In a dedicated environment, the GPU-accelerated code achieves five times higher throughput than the CPU-based code. Furthermore, a linear speedup of 30.7X is observed on a 32-node cluster of dedicated GPUs. We also implement a compute unified device architecture (CUDA) based algorithm to demonstrate further acceleration.

Download Full-text

Optimization of K-Means Clustering on Graphics Processing Unit Using Compute Unified Device Architecture

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2017.6274 ◽

2017 ◽

Vol 14 (1) ◽

pp. 789-795

Author(s):

V Saveetha ◽

S Sophia

Keyword(s):

High Performance ◽

Programming Model ◽

Graphics Processing Unit ◽

Direct Access ◽

Communication Overhead ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Central Processing ◽

Device Architecture ◽

Graphics Processing

Parallel data clustering aims at using algorithms and methods to extract knowledge from fat databases in rational time using high performance architectures. The computational challenge faced by cluster analysis due to increasing capacity of data can be overcome by exploiting the power of these architectures. The recent development in parallel power of Graphics Processing Unit enables low cost high performance solutions for general purpose applications. The Compute Unified Device Architecture programming model provides application programming interface methods to handle data proficiently on Graphics Processing Unit for iterative clustering algorithms like K-Means. The existing Graphics Processing Unit based K-Means algorithms highly focus on improvising the speedup of the algorithms and fall short to handle the high time spent on transfer of data between the Central Processing Unit and Graphics Processing Unit. A competent K-Means algorithm is proposed in this paper to lessen the transfer time by introducing a novel approach to check the convergence of the algorithm and utilize the pinned memory for direct access. This algorithm outperforms the other algorithms by maximizing parallelism and utilizing the memory features. The relative speedups and the validity measure for the proposed algorithm is elevated when compared with K-Means on Graphics Processing Unit and K-Means using Flag on Graphics Processing Unit. Thus the planned approach proves that communication overhead can be reduced in K-Means clustering.

Download Full-text