scholarly journals GPU-Powered Coherent Beamforming

2015 ◽  
Vol 04 (01n02) ◽  
pp. 1550002
Author(s):  
A. Magro ◽  
K. Zarb Adami ◽  
J. Hickish

Graphics processing units (GPU)-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimized for deployment at the BEST-2 array which can generate an arbitrary number of synthesized beams for a wide range of parameters. It achieves [Formula: see text] TFLOPs on an NVIDIA Tesla K20, approximately 10x faster than an optimized, multithreaded CPU implementation. This kernel has been integrated into two real-time, GPU-based time-domain software pipelines deployed at the BEST-2 array in Medicina: a standalone beamforming pipeline and a transient detection pipeline. We present performance benchmarks for the beamforming kernel as well as the transient detection pipeline with beamforming capabilities as well as results of test observation.

2020 ◽  
Vol 12 (8) ◽  
pp. 1257 ◽  
Author(s):  
Mercedes E. Paoletti ◽  
Juan M. Haut ◽  
Xuanwen Tao ◽  
Javier Plaza Miguel ◽  
Antonio Plaza

The storage and processing of remotely sensed hyperspectral images (HSIs) is facing important challenges due to the computational requirements involved in the analysis of these images, characterized by continuous and narrow spectral channels. Although HSIs offer many opportunities for accurately modeling and mapping the surface of the Earth in a wide range of applications, they comprise massive data cubes. These huge amounts of data impose important requirements from the storage and processing points of view. The support vector machine (SVM) has been one of the most powerful machine learning classifiers, able to process HSI data without applying previous feature extraction steps, exhibiting a robust behaviour with high dimensional data and obtaining high classification accuracies. Nevertheless, the training and prediction stages of this supervised classifier are very time-consuming, especially for large and complex problems that require an intensive use of memory and computational resources. This paper develops a new, highly efficient implementation of SVMs that exploits the high computational power of graphics processing units (GPUs) to reduce the execution time by massively parallelizing the operations of the algorithm while performing efficient memory management during data-reading and writing instructions. Our experiments, conducted over different HSI benchmarks, demonstrate the efficiency of our GPU implementation.


2018 ◽  
Vol 21 (06) ◽  
pp. 1850030 ◽  
Author(s):  
LOKMAN A. ABBAS-TURKI ◽  
STÉPHANE CRÉPEY ◽  
BABACAR DIALLO

We present a nested Monte Carlo (NMC) approach implemented on graphics processing units (GPUs) to X-valuation adjustments (XVAs), where X ranges over C for credit, F for funding, M for margin, and K for capital. The overall XVA suite involves five compound layers of dependence. Higher layers are launched first, and trigger nested simulations on-the-fly whenever required in order to compute an item from a lower layer. If the user is only interested in some of the XVA components, then only the sub-tree corresponding to the most outer XVA needs be processed computationally. Inner layers only need a square root number of simulation with respect to the most outer layer. Some of the layers exhibit a smaller variance. As a result, with GPUs at least, error-controlled NMC XVA computations are doable. But, although NMC is naively suited to parallelization, a GPU implementation of NMC XVA computations requires various optimizations. This is illustrated on XVA computations involving equities, interest rate, and credit derivatives, for both bilateral and central clearing XVA metrics.


2014 ◽  
Vol 1077 ◽  
pp. 118-123 ◽  
Author(s):  
Lubomír Klimeš ◽  
Pavel Charvát ◽  
Milan Ostrý ◽  
Josef Stetina

Phase change materials have a wide range of application including thermal energy storage in building structures, solar air collectors, heat storage units and exchangers. Such applications often utilize a commercially produced phase change material enclosed in a thin panel (container) made of aluminum. A parallel 1D heat transfer model of a container with phase change material was developed by means of the control volume and effective heat capacity methods. The parallel implementation in the CUDA computing architecture allows the model for running on graphics processing units which makes the model very fast in comparison to traditional models computed on a single CPU. The paper presents the model implementation and results of computational model benchmarking carried out with the use of high-level and low-level GPUs NVIDIA.


Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2018 ◽  
Vol 9 (2) ◽  
pp. 1
Author(s):  
André Luiz Buarque Vieira-e-Silva ◽  
Caio Brito ◽  
Mozart William Almeida ◽  
Veronica Teichrieb

Meshless methods to simulate fluid flows have been increasingly evolving through the years since they are a great alternative to deal with large deformations, which is where meshbased methods fail to perform efficiently. A well known meshless method is the Moving Particle Semi-implicit (MPS) method, which was designed to simulate free-surface truly incompressible fluid flows. Many variations and refinements of the method’s accuracy and precision have been proposed through the years and, in this paper, a reasonably wide literature review was performed together with their theoretical and mathematical explanations. Due to these works, it has proved to be very useful in a wide range of naval and mechanical engineering problems. However, one of its drawbacks is a high computational load and some quite time-consuming functions, which prevents it to be more used in Computer Graphics and Virtual Reality applications. Graphics Processing Units (GPU) provide unprecedented capabilities for scientific computations. To promote the GPU-acceleration, the solution of the Poisson Pressure equation was brought into focus. This work benefits from some of the techniques presented in the related work and also from the CUDA language in order to get a stable, accurate and GPU-accelerated MPS-based method, which is this work’s main contribution. It is shown that the GPU version of the method developed can perform from, approximately, 6 to 10 times faster with the same reliability as the CPU version, both extended to three dimensions. Lastly, a simulation containing a total of 62,600 particles is fully rendered in 3D.


IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 21152-21163 ◽  
Author(s):  
Rafael Cisneros-Magana ◽  
Aurelio Medina ◽  
Venkata Dinavahi ◽  
Antonio Ramos-Paz

2014 ◽  
Vol 11 (04) ◽  
pp. 1350063 ◽  
Author(s):  
IFTIKHAR AHMED ◽  
RICK SIOW MONG GOH ◽  
ENG HUAT KHOO ◽  
KIM HUAT LEE ◽  
SIAW KIAN ZHONG ◽  
...  

The Lorentz–Drude model incorporated Maxwell equations are simulated by using the three-dimensional finite difference time domain (FDTD) method and the method is parallelized on multiple graphics processing units (GPUs) for plasmonics applications. The compute unified device architecture (CUDA) is used for GPU parallelization. The Lorentz–Drude (LD) model is used to simulate the dispersive nature of materials in plasmonics domain and the auxiliary differential equation (ADE) approach is used to make it consistent with time domain Maxwell equations. Different aspects of multiple GPUs for the FDTD method are presented such as comparison of different numbers of GPUs, transfer time in between them, synchronous, and asynchronous passing. It is shown that by using multiple GPUs in parallel fashion, significant reduction in the simulation time can be achieved as compared to the single GPU.


2014 ◽  
Vol 23 (08) ◽  
pp. 1430002 ◽  
Author(s):  
SPARSH MITTAL

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.


2019 ◽  
Author(s):  
Qianqian Fang ◽  
Shijie Yan

AbstractThe mesh-based Monte Carlo (MMC) algorithm is increasingly used as the gold-standard for developing new biophotonics modeling techniques in 3-D complex tissues, including both diffusion-based and various Monte Carlo (MC) based methods. Compared to multi-layered and voxel-based MCs, MMC can utilize tetrahedral meshes to gain improved anatomical accuracy, but also results in higher computational and memory demands. Previous attempts of accelerating MMC using graphics processing units (GPUs) have yielded limited performance improvement and are not publicly available. Here we report a highly efficient MMC – MMCL – using the OpenCL heterogeneous computing framework, and demonstrate a speedup ratio up to 420× compared to state-of-the-art single-threaded CPU simulations. The MMCL simulator supports almost all advanced features found in our widely disseminated MMC software, such as support for a dozen of complex source forms, wide-field detectors, boundary reflection, photon replay and storing a rich set of detected photon information. Furthermore, this tool supports a wide range of GPUs/CPUs across vendors and is freely available with full source codes and benchmark suites at http://mcx.space/#mmc.


Sign in / Sign up

Export Citation Format

Share Document