GPU-Powered Coherent Beamforming

Graphics processing units (GPU)-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimized for deployment at the BEST-2 array which can generate an arbitrary number of synthesized beams for a wide range of parameters. It achieves [Formula: see text] TFLOPs on an NVIDIA Tesla K20, approximately 10x faster than an optimized, multithreaded CPU implementation. This kernel has been integrated into two real-time, GPU-based time-domain software pipelines deployed at the BEST-2 array in Medicina: a standalone beamforming pipeline and a transient detection pipeline. We present performance benchmarks for the beamforming kernel as well as the transient detection pipeline with beamforming capabilities as well as results of test observation.

Download Full-text

A New GPU Implementation of Support Vector Machines for Fast Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs12081257 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1257 ◽

Cited By ~ 7

Author(s):

Mercedes E. Paoletti ◽

Juan M. Haut ◽

Xuanwen Tao ◽

Javier Plaza Miguel ◽

Antonio Plaza

Keyword(s):

Graphics Processing Units ◽

Memory Management ◽

Hyperspectral Image ◽

Support Vector ◽

Writing Instructions ◽

Wide Range ◽

Computational Resources ◽

Graphics Processing ◽

Efficient Memory ◽

Gpu Implementation

The storage and processing of remotely sensed hyperspectral images (HSIs) is facing important challenges due to the computational requirements involved in the analysis of these images, characterized by continuous and narrow spectral channels. Although HSIs offer many opportunities for accurately modeling and mapping the surface of the Earth in a wide range of applications, they comprise massive data cubes. These huge amounts of data impose important requirements from the storage and processing points of view. The support vector machine (SVM) has been one of the most powerful machine learning classifiers, able to process HSI data without applying previous feature extraction steps, exhibiting a robust behaviour with high dimensional data and obtaining high classification accuracies. Nevertheless, the training and prediction stages of this supervised classifier are very time-consuming, especially for large and complex problems that require an intensive use of memory and computational resources. This paper develops a new, highly efficient implementation of SVMs that exploits the high computational power of graphics processing units (GPUs) to reduce the execution time by massively parallelizing the operations of the algorithm while performing efficient memory management during data-reading and writing instructions. Our experiments, conducted over different HSI benchmarks, demonstrate the efficiency of our GPU implementation.

Download Full-text

XVA PRINCIPLES, NESTED MONTE CARLO STRATEGIES, AND GPU OPTIMIZATIONS

International Journal of Theoretical and Applied Finance ◽

10.1142/s0219024918500309 ◽

2018 ◽

Vol 21 (06) ◽

pp. 1850030 ◽

Cited By ~ 3

Author(s):

LOKMAN A. ABBAS-TURKI ◽

STÉPHANE CRÉPEY ◽

BABACAR DIALLO

Keyword(s):

Monte Carlo ◽

Interest Rate ◽

Outer Layer ◽

Graphics Processing Units ◽

Lower Layer ◽

Credit Derivatives ◽

Square Root ◽

Root Number ◽

Graphics Processing ◽

Gpu Implementation

We present a nested Monte Carlo (NMC) approach implemented on graphics processing units (GPUs) to X-valuation adjustments (XVAs), where X ranges over C for credit, F for funding, M for margin, and K for capital. The overall XVA suite involves five compound layers of dependence. Higher layers are launched first, and trigger nested simulations on-the-fly whenever required in order to compute an item from a lower layer. If the user is only interested in some of the XVA components, then only the sub-tree corresponding to the most outer XVA needs be processed computationally. Inner layers only need a square root number of simulation with respect to the most outer layer. Some of the layers exhibit a smaller variance. As a result, with GPUs at least, error-controlled NMC XVA computations are doable. But, although NMC is naively suited to parallelization, a GPU implementation of NMC XVA computations requires various optimizations. This is illustrated on XVA computations involving equities, interest rate, and credit derivatives, for both bilateral and central clearing XVA metrics.

Download Full-text

Accelerating radio astronomy cross-correlation with graphics processing units

The International Journal of High Performance Computing Applications ◽

10.1177/1094342012444794 ◽

2012 ◽

Vol 27 (2) ◽

pp. 178-192 ◽

Cited By ~ 24

Author(s):

M.A. Clark ◽

PC La Plante ◽

L.J. Greenhill

Keyword(s):

Radio Astronomy ◽

Graphics Processing Units ◽

Cross Correlation ◽

Graphics Processing

Download Full-text

Parallel Heat Transfer Model of a Panel with Phase Change Material for Thermal Storage Applications Computed on Graphics Processing Units

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1077.118 ◽

2014 ◽

Vol 1077 ◽

pp. 118-123 ◽

Cited By ~ 1

Author(s):

Lubomír Klimeš ◽

Pavel Charvát ◽

Milan Ostrý ◽

Josef Stetina

Keyword(s):

Heat Transfer ◽

Phase Change ◽

Phase Change Material ◽

Graphics Processing Units ◽

Parallel Implementation ◽

Heat Transfer Model ◽

Transfer Model ◽

Wide Range ◽

Graphics Processing ◽

Change Material

Phase change materials have a wide range of application including thermal energy storage in building structures, solar air collectors, heat storage units and exchangers. Such applications often utilize a commercially produced phase change material enclosed in a thin panel (container) made of aluminum. A parallel 1D heat transfer model of a container with phase change material was developed by means of the control volume and effective heat capacity methods. The parallel implementation in the CUDA computing architecture allows the model for running on graphics processing units which makes the model very fast in comparison to traditional models computed on a single CPU. The paper presents the model implementation and results of computational model benchmarking carried out with the use of high-level and low-level GPUs NVIDIA.

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

Improved MPS method and its variations for simulating incompressible fluids on GPU

Journal of Interactive Systems ◽

10.5753/jis.2018.701 ◽

2018 ◽

Vol 9 (2) ◽

pp. 1

Author(s):

André Luiz Buarque Vieira-e-Silva ◽

Caio Brito ◽

Mozart William Almeida ◽

Veronica Teichrieb

Keyword(s):

Graphics Processing Units ◽

Fluid Flows ◽

Three Dimensions ◽

Mps Method ◽

Mathematical Explanations ◽

Accuracy And Precision ◽

Wide Range ◽

Moving Particle ◽

Incompressible Fluid Flows ◽

Graphics Processing

Meshless methods to simulate fluid flows have been increasingly evolving through the years since they are a great alternative to deal with large deformations, which is where meshbased methods fail to perform efficiently. A well known meshless method is the Moving Particle Semi-implicit (MPS) method, which was designed to simulate free-surface truly incompressible fluid flows. Many variations and refinements of the method’s accuracy and precision have been proposed through the years and, in this paper, a reasonably wide literature review was performed together with their theoretical and mathematical explanations. Due to these works, it has proved to be very useful in a wide range of naval and mechanical engineering problems. However, one of its drawbacks is a high computational load and some quite time-consuming functions, which prevents it to be more used in Computer Graphics and Virtual Reality applications. Graphics Processing Units (GPU) provide unprecedented capabilities for scientific computations. To promote the GPU-acceleration, the solution of the Poisson Pressure equation was brought into focus. This work benefits from some of the techniques presented in the related work and also from the CUDA language in order to get a stable, accurate and GPU-accelerated MPS-based method, which is this work’s main contribution. It is shown that the GPU version of the method developed can perform from, approximately, 6 to 10 times faster with the same reliability as the CPU version, both extended to three dimensions. Lastly, a simulation containing a total of 62,600 particles is fully rendered in 3D.

Download Full-text

Time-Domain Power Quality State Estimation Based on Kalman Filter Using Parallel Computing on Graphics Processing Units

IEEE Access ◽

10.1109/access.2018.2823721 ◽

2018 ◽

Vol 6 ◽

pp. 21152-21163 ◽

Cited By ~ 7

Author(s):

Rafael Cisneros-Magana ◽

Aurelio Medina ◽

Venkata Dinavahi ◽

Antonio Ramos-Paz

Keyword(s):

Parallel Computing ◽

Kalman Filter ◽

State Estimation ◽

Power Quality ◽

Time Domain ◽

Graphics Processing Units ◽

Graphics Processing

Download Full-text

IMPLEMENTATION OF THE LORENTZ–DRUDE MODEL INCORPORATED FDTD METHOD ON MULTIPLE GPUs FOR PLASMONICS APPLICATIONS

International Journal of Computational Methods ◽

10.1142/s0219876213500631 ◽

2014 ◽

Vol 11 (04) ◽

pp. 1350063 ◽

Cited By ~ 2

Author(s):

IFTIKHAR AHMED ◽

RICK SIOW MONG GOH ◽

ENG HUAT KHOO ◽

KIM HUAT LEE ◽

SIAW KIAN ZHONG ◽

...

Keyword(s):

Time Domain ◽

Graphics Processing Units ◽

Maxwell Equations ◽

Fdtd Method ◽

Three Dimensional ◽

Drude Model ◽

Multiple Gpus ◽

Device Architecture ◽

Graphics Processing ◽

Difference Time

The Lorentz–Drude model incorporated Maxwell equations are simulated by using the three-dimensional finite difference time domain (FDTD) method and the method is parallelized on multiple graphics processing units (GPUs) for plasmonics applications. The compute unified device architecture (CUDA) is used for GPU parallelization. The Lorentz–Drude (LD) model is used to simulate the dispersive nature of materials in plasmonics domain and the auxiliary differential equation (ADE) approach is used to make it consistent with time domain Maxwell equations. Different aspects of multiple GPUs for the FDTD method are presented such as comparison of different numbers of GPUs, transfer time in between them, synchronous, and asynchronous passing. It is shown that by using multiple GPUs in parallel fashion, significant reduction in the simulation time can be achieved as compared to the single GPU.

Download Full-text

A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs

Journal of Circuits System and Computers ◽

10.1142/s0218126614300025 ◽

2014 ◽

Vol 23 (08) ◽

pp. 1430002 ◽

Cited By ~ 11

Author(s):

SPARSH MITTAL

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

General Purpose ◽

System Level ◽

Cache Management ◽

Full Potential ◽

Wide Range ◽

Computing Platforms ◽

Graphics Processing

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.

Download Full-text

GPU-accelerated mesh-based Monte Carlo photon transport simulations

10.1101/815977 ◽

2019 ◽

Author(s):

Qianqian Fang ◽

Shijie Yan

Keyword(s):

Monte Carlo ◽

Graphics Processing Units ◽

Heterogeneous Computing ◽

Wide Field ◽

Source Codes ◽

Wide Range ◽

Complex Source ◽

Speedup Ratio ◽

Almost All ◽

Graphics Processing

AbstractThe mesh-based Monte Carlo (MMC) algorithm is increasingly used as the gold-standard for developing new biophotonics modeling techniques in 3-D complex tissues, including both diffusion-based and various Monte Carlo (MC) based methods. Compared to multi-layered and voxel-based MCs, MMC can utilize tetrahedral meshes to gain improved anatomical accuracy, but also results in higher computational and memory demands. Previous attempts of accelerating MMC using graphics processing units (GPUs) have yielded limited performance improvement and are not publicly available. Here we report a highly efficient MMC – MMCL – using the OpenCL heterogeneous computing framework, and demonstrate a speedup ratio up to 420× compared to state-of-the-art single-threaded CPU simulations. The MMCL simulator supports almost all advanced features found in our widely disseminated MMC software, such as support for a dozen of complex source forms, wide-field detectors, boundary reflection, photon replay and storing a rich set of detected photon information. Furthermore, this tool supports a wide range of GPUs/CPUs across vendors and is freely available with full source codes and benchmark suites at http://mcx.space/#mmc.

Download Full-text