A New GPU Implementation of Support Vector Machines for Fast Hyperspectral Image Classification

The storage and processing of remotely sensed hyperspectral images (HSIs) is facing important challenges due to the computational requirements involved in the analysis of these images, characterized by continuous and narrow spectral channels. Although HSIs offer many opportunities for accurately modeling and mapping the surface of the Earth in a wide range of applications, they comprise massive data cubes. These huge amounts of data impose important requirements from the storage and processing points of view. The support vector machine (SVM) has been one of the most powerful machine learning classifiers, able to process HSI data without applying previous feature extraction steps, exhibiting a robust behaviour with high dimensional data and obtaining high classification accuracies. Nevertheless, the training and prediction stages of this supervised classifier are very time-consuming, especially for large and complex problems that require an intensive use of memory and computational resources. This paper develops a new, highly efficient implementation of SVMs that exploits the high computational power of graphics processing units (GPUs) to reduce the execution time by massively parallelizing the operations of the algorithm while performing efficient memory management during data-reading and writing instructions. Our experiments, conducted over different HSI benchmarks, demonstrate the efficiency of our GPU implementation.

Download Full-text

GPU-Powered Coherent Beamforming

Journal of Astronomical Instrumentation ◽

10.1142/s2251171715500026 ◽

2015 ◽

Vol 04 (01n02) ◽

pp. 1550002

Author(s):

A. Magro ◽

K. Zarb Adami ◽

J. Hickish

Keyword(s):

Radio Astronomy ◽

Time Domain ◽

Graphics Processing Units ◽

Transient Detection ◽

Transfer Data ◽

Wide Range ◽

Performance Benchmarks ◽

Unexplored Area ◽

Graphics Processing ◽

Gpu Implementation

Graphics processing units (GPU)-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimized for deployment at the BEST-2 array which can generate an arbitrary number of synthesized beams for a wide range of parameters. It achieves [Formula: see text] TFLOPs on an NVIDIA Tesla K20, approximately 10x faster than an optimized, multithreaded CPU implementation. This kernel has been integrated into two real-time, GPU-based time-domain software pipelines deployed at the BEST-2 array in Medicina: a standalone beamforming pipeline and a transient detection pipeline. We present performance benchmarks for the beamforming kernel as well as the transient detection pipeline with beamforming capabilities as well as results of test observation.

Download Full-text

XVA PRINCIPLES, NESTED MONTE CARLO STRATEGIES, AND GPU OPTIMIZATIONS

International Journal of Theoretical and Applied Finance ◽

10.1142/s0219024918500309 ◽

2018 ◽

Vol 21 (06) ◽

pp. 1850030 ◽

Cited By ~ 3

Author(s):

LOKMAN A. ABBAS-TURKI ◽

STÉPHANE CRÉPEY ◽

BABACAR DIALLO

Keyword(s):

Monte Carlo ◽

Interest Rate ◽

Outer Layer ◽

Graphics Processing Units ◽

Lower Layer ◽

Credit Derivatives ◽

Square Root ◽

Root Number ◽

Graphics Processing ◽

Gpu Implementation

We present a nested Monte Carlo (NMC) approach implemented on graphics processing units (GPUs) to X-valuation adjustments (XVAs), where X ranges over C for credit, F for funding, M for margin, and K for capital. The overall XVA suite involves five compound layers of dependence. Higher layers are launched first, and trigger nested simulations on-the-fly whenever required in order to compute an item from a lower layer. If the user is only interested in some of the XVA components, then only the sub-tree corresponding to the most outer XVA needs be processed computationally. Inner layers only need a square root number of simulation with respect to the most outer layer. Some of the layers exhibit a smaller variance. As a result, with GPUs at least, error-controlled NMC XVA computations are doable. But, although NMC is naively suited to parallelization, a GPU implementation of NMC XVA computations requires various optimizations. This is illustrated on XVA computations involving equities, interest rate, and credit derivatives, for both bilateral and central clearing XVA metrics.

Download Full-text

Parallel Heat Transfer Model of a Panel with Phase Change Material for Thermal Storage Applications Computed on Graphics Processing Units

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1077.118 ◽

2014 ◽

Vol 1077 ◽

pp. 118-123 ◽

Cited By ~ 1

Author(s):

Lubomír Klimeš ◽

Pavel Charvát ◽

Milan Ostrý ◽

Josef Stetina

Keyword(s):

Heat Transfer ◽

Phase Change ◽

Phase Change Material ◽

Graphics Processing Units ◽

Parallel Implementation ◽

Heat Transfer Model ◽

Transfer Model ◽

Wide Range ◽

Graphics Processing ◽

Change Material

Phase change materials have a wide range of application including thermal energy storage in building structures, solar air collectors, heat storage units and exchangers. Such applications often utilize a commercially produced phase change material enclosed in a thin panel (container) made of aluminum. A parallel 1D heat transfer model of a container with phase change material was developed by means of the control volume and effective heat capacity methods. The parallel implementation in the CUDA computing architecture allows the model for running on graphics processing units which makes the model very fast in comparison to traditional models computed on a single CPU. The paper presents the model implementation and results of computational model benchmarking carried out with the use of high-level and low-level GPUs NVIDIA.

Download Full-text

A GPU based multidimensional amplitude analysis to search for tetraquark candidates

10.21203/rs.3.rs-51185/v3 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Multiple Cores ◽

Analysis Strategies ◽

Computationally Intensive ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

A GPU based multidimensional amplitude analysis to search for tetraquark candidates

10.21203/rs.3.rs-51185/v2 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Multiple Cores ◽

Analysis Strategies ◽

Computationally Intensive ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text

Improved MPS method and its variations for simulating incompressible fluids on GPU

Journal of Interactive Systems ◽

10.5753/jis.2018.701 ◽

2018 ◽

Vol 9 (2) ◽

pp. 1

Author(s):

André Luiz Buarque Vieira-e-Silva ◽

Caio Brito ◽

Mozart William Almeida ◽

Veronica Teichrieb

Keyword(s):

Graphics Processing Units ◽

Fluid Flows ◽

Three Dimensions ◽

Mps Method ◽

Mathematical Explanations ◽

Accuracy And Precision ◽

Wide Range ◽

Moving Particle ◽

Incompressible Fluid Flows ◽

Graphics Processing

Meshless methods to simulate fluid flows have been increasingly evolving through the years since they are a great alternative to deal with large deformations, which is where meshbased methods fail to perform efficiently. A well known meshless method is the Moving Particle Semi-implicit (MPS) method, which was designed to simulate free-surface truly incompressible fluid flows. Many variations and refinements of the method’s accuracy and precision have been proposed through the years and, in this paper, a reasonably wide literature review was performed together with their theoretical and mathematical explanations. Due to these works, it has proved to be very useful in a wide range of naval and mechanical engineering problems. However, one of its drawbacks is a high computational load and some quite time-consuming functions, which prevents it to be more used in Computer Graphics and Virtual Reality applications. Graphics Processing Units (GPU) provide unprecedented capabilities for scientific computations. To promote the GPU-acceleration, the solution of the Poisson Pressure equation was brought into focus. This work benefits from some of the techniques presented in the related work and also from the CUDA language in order to get a stable, accurate and GPU-accelerated MPS-based method, which is this work’s main contribution. It is shown that the GPU version of the method developed can perform from, approximately, 6 to 10 times faster with the same reliability as the CPU version, both extended to three dimensions. Lastly, a simulation containing a total of 62,600 particles is fully rendered in 3D.

Download Full-text

Development of a Network Infrastructure for Heterogeneous Robot and Control Systems Interactions

Volume 4B: Dynamics, Vibration, and Control ◽

10.1115/imece2015-52464 ◽

2015 ◽

Cited By ~ 1

Author(s):

Christopher J. Reid ◽

Biswanath Samanta ◽

Christopher Kadlec

Keyword(s):

Graphics Processing Units ◽

Virtual Machines ◽

Control Software ◽

Network Infrastructure ◽

Complex Tasks ◽

Heterogeneous Robots ◽

Robust Network ◽

Computational Resources ◽

Graphics Processing ◽

And Control

The use of robots in complex tasks such as search and rescue operations is becoming more and more common. These robots often work independently with no cooperation with other robots or control software, and are very limited in their ability to perform dynamic tasks and interact with both humans and other robots. To this end, a system must be developed to facilitate the cooperation of heterogeneous robots to complete complex tasks. To model and study human-robot and robot-robot interactions in a multi-system environment, a robust network infrastructure must be implemented to support the broad nature of these studies. The work presented here details the creation of a cloud-based infrastructure designed to support the introduction and implementation of multiple heterogeneous robots to the environment utilizing the Robot Operating System (ROS). Implemented robots include both ground-based (e.g. Turtlebot) and air-based (e.g Parrot ARDrone2.0) systems. Additional hardware is also implemented, such as embedded vision systems, host computers to support virtual machines for software implementation, and machines with graphics processing units (GPUs) for additional computational resources. Control software for the robots is implemented in the system with complexities ranging from simple teleoperation to skeletal tracking and neural network simulators. A robust integration of multiple heterogeneous components, including both hardware and software, is achieved.

Download Full-text

Massively Threaded Digital Forensics Tools

Digital Rights Management ◽

10.4018/978-1-4666-2136-7.ch023 ◽

2013 ◽

pp. 488-509

Author(s):

Lodovico Marziale ◽

Santhi Movva ◽

Golden G. Richard ◽

Vassil Roussev ◽

Loren Schwiebert

Keyword(s):

Intellectual Property ◽

Network Security ◽

Graphics Processing Units ◽

Digital Forensics ◽

Turnaround Time ◽

Digital Evidence ◽

Counter Terrorism ◽

On Line ◽

Computational Resources ◽

Graphics Processing

Digital forensics comprises the set of techniques to recover, preserve, and examine digital evidence, and has applications in a number of important areas, including investigation of child exploitation, identity theft, counter-terrorism, and intellectual property disputes. Digital forensics tools must exhaustively examine and interpret data at a low level, because data of evidentiary value may have been deleted, partially overwritten, obfuscated, or corrupted. While forensics investigation is typically seen as an off-line activity, improving case turnaround time is crucial, because in many cases lives or livelihoods may hang in the balance. Furthermore, if more computational resources can be brought to bear, we believe that preventative network security (which must be performed on-line) and digital forensics can be merged into a common research focus. In this chapter we consider recent hardware trends and argue that multicore CPUs and Graphics Processing Units (GPUs) offer one solution to the problem of maximizing available compute resources.

Download Full-text

A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs

Journal of Circuits System and Computers ◽

10.1142/s0218126614300025 ◽

2014 ◽

Vol 23 (08) ◽

pp. 1430002 ◽

Cited By ~ 11

Author(s):

SPARSH MITTAL

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

General Purpose ◽

System Level ◽

Cache Management ◽

Full Potential ◽

Wide Range ◽

Computing Platforms ◽

Graphics Processing

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.

Download Full-text