High Performance Computing Environment using General Purpose Computations on Graphics Processing Unit

Here a report of a development phase of an environment of high performance computing (HPC) using general purpose computations on the graphics processing unit (GPGPU) is presented. The HPC environment accommodates computational tasks which demand massive parallelisms or multi-threaded computations. For this purpose, GPGPU is utilized because such tasks require many computing cores running in parallel. The development phase consists of several stages, followed by testing its capabilities and performance. For starters, the HPC environment will be served for computational projects of students and members of the Faculty of Information Technology, Universitas Kristen Maranatha. The goal of this paper is to show a design of a HPC which is capable of running complex and multi-threaded computations. The test results of the HPC show that the GPGPU numerical computations have superior performance than the CPU, with the same level of precision.

Download Full-text

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

Computers ◽

10.3390/computers2040176 ◽

2013 ◽

Vol 2 (4) ◽

pp. 176-214 ◽

Cited By ~ 5

Author(s):

Teng Li ◽

Vikram Narayana ◽

Tarek El-Ghazawi

Keyword(s):

High Performance Computing ◽

Resource Sharing ◽

High Performance ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing ◽

Performance Computing

Download Full-text

Graphics Processing Unit-Based High Performance Computing in Radiation Therapy

10.1201/b18968 ◽

2018 ◽

Cited By ~ 1

Keyword(s):

Radiation Therapy ◽

High Performance Computing ◽

High Performance ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing ◽

Performance Computing

Download Full-text

Splotch

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016652713 ◽

2016 ◽

Vol 31 (6) ◽

pp. 550-563

Author(s):

Timothy Dykes ◽

Claudio Gheller ◽

Marzia Rivi ◽

Mel Krokos

Keyword(s):

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Processing Unit ◽

Xeon Phi ◽

The Many ◽

Many Core ◽

Performance Results ◽

Graphics Processing ◽

Performance Computing

With the increasing size and complexity of data produced by large-scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous high-performance computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel’s coprocessor based upon the new many integrated core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally we compare performance against results achieved with the Graphics Processing Unit (GPU) based implementation of Splotch.

Download Full-text

High-performance computing in water resources hydrodynamics

Journal of Hydroinformatics ◽

10.2166/hydro.2020.163 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1217-1235 ◽

Cited By ~ 3

Author(s):

M. Morales-Hernández ◽

M. B. Sharif ◽

S. Gangrade ◽

T. T. Dullo ◽

S.-C. Kao ◽

...

Keyword(s):

Water Resources ◽

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Graphics Processing ◽

Performance Computing

Abstract This work presents a vision of future water resources hydrodynamics codes that can fully utilize the strengths of modern high-performance computing (HPC). The advances to computing power, formerly driven by the improvement of central processing unit processors, now focus on parallel computing and, in particular, the use of graphics processing units (GPUs). However, this shift to a parallel framework requires refactoring the code to make efficient use of the data as well as changing even the nature of the algorithm that solves the system of equations. These concepts along with other features such as the precision for the computations, dry regions management, and input/output data are analyzed in this paper. A 2D multi-GPU flood code applied to a large-scale test case is used to corroborate our statements and ascertain the new challenges for the next-generation parallel water resources codes.

Download Full-text

CUDA-ACCELERATED FEATURE SELECTION

Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020) ◽

10.47412/juqg5057 ◽

2020 ◽

Author(s):

Sterling Ramroach ◽

Jonathan Herbert ◽

Ajay Joshi

Keyword(s):

High Performance ◽

Graphics Processing Unit ◽

Pearson Correlation ◽

High Dimensional ◽

Processing Unit ◽

Device Architecture ◽

Importance Ranking ◽

Using Data ◽

Graphics Processing ◽

Performance Computing

Identifying important features from high dimensional data is usually done using one-dimensional filtering techniques. These techniques discard noisy attributes and those that are constant throughout the data. This is a time-consuming task that has scope for acceleration via high performance computing techniques involving the graphics processing unit (GPU). The proposed algorithm involves acceleration via the Compute Unified Device Architecture (CUDA) framework developed by Nvidia. This framework facilitates the seamless scaling of computation on any CUDA-enabled GPUs. Thus, the Pearson Correlation Coefficient can be applied in parallel on each feature with respect to the response variable. The ranks obtained for each feature can be used to determine the most relevant features to select. Using data from the UCI Machine Learning Repository, our results show an increase in efficiency for multi-dimensional analysis with a more reliable feature importance ranking. When tested on a high-dimensional dataset of 1000 samples and 10,000 features, we achieved a 1,230-time speedup using CUDA. This acceleration grows exponentially, as with any embarrassingly parallel task.

Download Full-text

Implementation and performance of a general purpose graphics processing unit in hyperspectral image analysis

International Journal of Applied Earth Observation and Geoinformation ◽

10.1016/j.jag.2013.08.009 ◽

2014 ◽

Vol 26 ◽

pp. 312-321 ◽

Cited By ~ 1

Author(s):

H.M.A. van der Werff ◽

W.H. Bakker

Keyword(s):

Image Analysis ◽

Hyperspectral Image ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Hyperspectral Image Analysis ◽

And Performance ◽

Graphics Processing

Download Full-text

GPU Computing with Python: Performance, Energy Efficiency and Usability

Computation ◽

10.3390/computation8010004 ◽

2020 ◽

Vol 8 (1) ◽

pp. 4 ◽

Cited By ~ 1

Author(s):

Håvard H. Holm ◽

André R. Brodtkorb ◽

Martin L. Sætra

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Gpu Computing ◽

Graphics Processing Unit ◽

Processing Unit ◽

Device Architecture ◽

Computational Performance ◽

Graphics Processing ◽

The Impact ◽

Performance Computing

In this work, we examine the performance, energy efficiency, and usability when using Python for developing high-performance computing codes running on the graphics processing unit (GPU). We investigate the portability of performance and energy efficiency between Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL); between GPU generations; and between low-end, mid-range, and high-end GPUs. Our findings showed that the impact of using Python is negligible for our applications, and furthermore, CUDA and OpenCL applications tuned to an equivalent level can in many cases obtain the same computational performance. Our experiments showed that performance in general varies more between different GPUs than between using CUDA and OpenCL. We also show that tuning for performance is a good way of tuning for energy efficiency, but that specific tuning is needed to obtain optimal energy efficiency.

Download Full-text

HYPERDOCK: Improving virtual screening through parallel hyperheuristics

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019847732 ◽

2019 ◽

Vol 34 (1) ◽

pp. 30-41

Author(s):

Baldomero Imbernón ◽

Antonio Llanes ◽

José-Matías Cutillas-Lozano ◽

Domingo Giménez

Keyword(s):

Virtual Screening ◽

High Performance ◽

Graphics Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Pharmacological Targets ◽

Graphics Processing ◽

Computational Systems ◽

Performance Computing ◽

Different Levels

Virtual screening (VS) methods aid clinical research by predicting the interaction of ligands with pharmacological targets. The computational requirements of VS, along with the size of the databases, propitiate the use of high-performance computing. METADOCK is a tool for the application of metaheuristics to VS in heterogeneous clusters of computers based on central processing unit (CPU) and graphics processing unit (GPU). HYPERDOCK represents a step forward; the exploration for satisfactory metaheuristics is systematically approached by means of hyperheuristics working on top of the metaheuristic schema of METADOCK. Multiple metaheuristics are explored, so the process is computationally demanding. HYPERDOCK exploits the parallelism of METADOCK and includes parallelism at its own level. The different levels of parallelism can be used to exploit the parallelism offered by computational systems composed of multicore CPU + multi-GPUs. The efficient exploitation of these systems enables HYPERDOCK to improve ligand–receptor binding.

Download Full-text

GPU Computation and Platforms

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Emerging Research Surrounding Power Consumption and Performance Issues in Utility Computing ◽

10.4018/978-1-4666-8853-7.ch007 ◽

2016 ◽

pp. 136-174

Author(s):

K. Bhargavi ◽

Sathish Babu B.

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Gpu Computing ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Computing Platforms ◽

Computationally Intensive ◽

Graphics Processing

The GPUs (Graphics Processing Unit) were mainly used to speed up computation intensive high performance computing applications. There are several tools and technologies available to perform general purpose computationally intensive application. This chapter primarily discusses about GPU parallelism, applications, probable challenges and also highlights some of the GPU computing platforms, which includes CUDA, OpenCL (Open Computing Language), OpenMPC (Open MP extended for CUDA), MPI (Message Passing Interface), OpenACC (Open Accelerator), DirectCompute, and C++ AMP (C++ Accelerated Massive Parallelism). Each of these platforms is discussed briefly along with their advantages and disadvantages.

Download Full-text

Real-time Visualisation and Analysis of Tera-scale Datasets

Proceedings of the International Astronomical Union ◽

10.1017/s1743921314012873 ◽

2012 ◽

Vol 10 (H16) ◽

pp. 679-680

Author(s):

Christopher J. Fluke

Keyword(s):

Real Time ◽

High Performance ◽

Graphics Processing Unit ◽

Low Cost ◽

Processing Unit ◽

Computing Environments ◽

Graphics Processing ◽

Interactive Visualisation ◽

Performance Computing ◽

Scale Data

AbstractAs we move ever closer to the Square Kilometre Array era, support for real-time, interactive visualisation and analysis of tera-scale (and beyond) data cubes will be crucial for on-going knowledge discovery. However, the data-on-the-desktop approach to analysis and visualisation that most astronomers are comfortable with will no longer be feasible: tera-scale data volumes exceed the memory and processing capabilities of standard desktop computing environments. Instead, there will be an increasing need for astronomers to utilise remote high performance computing (HPC) resources. In recent years, the graphics processing unit (GPU) has emerged as a credible, low cost option for HPC. A growing number of supercomputing centres are now investing heavily in GPU technologies to provide O(100) Teraflop/s processing. I describe how a GPU-powered computing cluster allows us to overcome the analysis and visualisation challenges of tera-scale data. With a GPU-based architecture, we have moved the bottleneck from processing-limited to bandwidth-limited, achieving exceptional real-time performance for common visualisation and data analysis tasks.

Download Full-text