Passive Radar Parallel Processing Using General-Purpose Computing on Graphics Processing Units

Abstract In the paper an implementation of signal processing chain for a passive radar is presented. The passive radar which was developed at the Warsaw University of Technology, uses FM radio and DVB-T television transmitters as ”illuminators of opportunity”. As the computational load associated with passive radar processing is very high, NVIDIA CUDA technology has been employed for effective implementation using parallel processing. The paper contains the description of the algorithms implementation and the performance results analysis.

Download Full-text

Effective implementation of passive radar algorithms using general-purpose computing on graphics processing units

2015 Signal Processing Symposium (SPSympo) ◽

10.1109/sps.2015.7168277 ◽

2015 ◽

Cited By ~ 2

Author(s):

Karolina Szczepankiewicz ◽

Mateusz Malanowski ◽

Michal Szczepankiewicz

Keyword(s):

Graphics Processing Units ◽

General Purpose ◽

Passive Radar ◽

Effective Implementation ◽

Graphics Processing

Download Full-text

TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

Journal of Applied Mathematics ◽

10.1155/2013/437357 ◽

2013 ◽

Vol 2013 ◽

pp. 1-15 ◽

Cited By ~ 4

Author(s):

Carlos Couder-Castañeda ◽

Carlos Ortiz-Alemán ◽

Mauricio Gabriel Orozco-del-Castillo ◽

Mauricio Nava-Flores

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Forward Modeling ◽

Gravity Gradient ◽

Constant Density ◽

Gravitational Fields ◽

Design And Implementation ◽

Cuda Technology ◽

Performance Results ◽

Graphics Processing

An implementation with the CUDA technology in a single and in several graphics processing units (GPUs) is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications with various applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.

Download Full-text

GPGPU Accelerated 3-Axis CNC Machining Simulation

Volume 1: Processing ◽

10.1115/msec2013-1096 ◽

2013 ◽

Cited By ~ 1

Author(s):

Dmytro Konobrytskyi ◽

Thomas Kurfess ◽

Joshua Tarbutton ◽

Tommy Tucker

Keyword(s):

Parallel Processing ◽

Graphics Processing Units ◽

Clock Cycle ◽

General Purpose ◽

Cnc Machining ◽

Experimental Simulation ◽

Machining Simulation ◽

Test Configuration ◽

Central Processing ◽

Performance Results

GPUs (Graphics Processing Units), traditionally used for 3D graphics calculations, have recently got an ability to perform general purpose calculations with a GPGPU (General Purpose GPU) technology. Moreover, GPUs can be much faster than CPUs (Central Processing Units) by performing hundreds or even thousands commands concurrently. This parallel processing allows the GPU achieving the extremely high performance but also requires using only highly parallel algorithms which can provide enough commands on each clock cycle. This work formulates a methodology for selection of a right geometry representation and a data structure suitable for parallel processing on GPU. Then the methodology is used for designing the 3-axis CNC milling simulation algorithm accelerated with the GPGPU technology. The developed algorithm is validated by performing an experimental machining simulation and evaluation of the performance results. The experimental simulation shows an importance of an optimization process and usage of algorithms that provide enough work to GPU. The used test configuration also demonstrates almost an order of magnitude difference between CPU and GPU performance results.

Download Full-text

DSPSR: Digital Signal Processing Software for Pulsar Astronomy

Publications of the Astronomical Society of Australia ◽

10.1071/as10021 ◽

2011 ◽

Vol 28 (1) ◽

pp. 1-14 ◽

Cited By ~ 172

Author(s):

W. van Straten ◽

M. Bailes

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Graphics Processing Units ◽

High Performance ◽

Digital Signal ◽

General Purpose ◽

Design Decisions ◽

Extensive Range ◽

Processing Software ◽

Graphics Processing

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.

Download Full-text

Accelerating reaction–diffusion simulations with general-purpose graphics processing units

Bioinformatics ◽

10.1093/bioinformatics/btq622 ◽

2010 ◽

Vol 27 (2) ◽

pp. 288-290 ◽

Cited By ~ 30

Author(s):

Matthias Vigelius ◽

Aidan Lane ◽

Bernd Meyer

Keyword(s):

Graphics Processing Units ◽

Reaction Diffusion ◽

General Purpose ◽

Graphics Processing

Download Full-text

More Faster Self-Organizing Maps by General Purpose on Graphics Processing Units

Advances in Intelligent Systems and Computing - Soft Computing in Machine Learning ◽

10.1007/978-3-319-05533-6_5 ◽

2014 ◽

pp. 41-51

Author(s):

Shinji Kawakami ◽

Keiji Kamei

Keyword(s):

Graphics Processing Units ◽

General Purpose ◽

Self Organizing Maps ◽

Graphics Processing ◽

Self Organizing

Download Full-text

A Representation of Membrane Computing with a Clustering Algorithm on the Graphical Processing Unit

Processes ◽

10.3390/pr8091199 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1199

Author(s):

Ravie Chandren Muniyandi ◽

Ali Maroosi

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Hamiltonian Path ◽

Fold Increase ◽

General Purpose ◽

Processing Unit ◽

Thread Block ◽

Hard Problems ◽

Graphical Processing ◽

Graphics Processing

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.

Download Full-text

Accelerating in-memory transaction processing using general purpose graphics processing units

Future Generation Computer Systems ◽

10.1016/j.future.2019.03.034 ◽

2019 ◽

Vol 97 ◽

pp. 836-848

Author(s):

Lan Gao ◽

Yunlong Xu ◽

Rui Wang ◽

Hailong Yang ◽

Zhongzhi Luan ◽

...

Keyword(s):

Graphics Processing Units ◽

Transaction Processing ◽

General Purpose ◽

Graphics Processing

Download Full-text

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units - GPGPU-6

10.1145/2458523 ◽

2013 ◽

Keyword(s):

Graphics Processing Units ◽

General Purpose ◽

General Purpose Processor ◽

Graphics Processing

Download Full-text

Accelerated FDPS: Algorithms to use accelerators with FDPS

Publications of the Astronomical Society of Japan ◽

10.1093/pasj/psz133 ◽

2020 ◽

Vol 72 (1) ◽

Cited By ~ 2

Author(s):

Masaki Iwasawa ◽

Daisuke Namekata ◽

Keigo Nitadori ◽

Kentaro Nomura ◽

Long Wang ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

General Purpose ◽

Performance Model ◽

Performance Tuning ◽

Data Types ◽

Interaction Function ◽

Current Implementation ◽

And Performance ◽

Graphics Processing

Abstract We describe algorithms implemented in FDPS (Framework for Developing Particle Simulators) to make efficient use of accelerator hardware such as GPGPUs (general-purpose computing on graphics processing units). We have developed FDPS to make it possible for researchers to develop their own high-performance parallel particle-based simulation programs without spending large amounts of time on parallelization and performance tuning. FDPS provides a high-performance implementation of parallel algorithms for particle-based simulations in a “generic” form, so that researchers can define their own particle data structure and interparticle interaction functions. FDPS compiled with user-supplied data types and interaction functions provides all the necessary functions for parallelization, and researchers can thus write their programs as though they are writing simple non-parallel code. It has previously been possible to use accelerators with FDPS by writing an interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator, and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of the user-provided interaction functions so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the CPU side and the amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a system with an NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth. Thus, our implementation will be applicable to future generations of accelerator system.

Download Full-text