scholarly journals GPU Computing with Python: Performance, Energy Efficiency and Usability

Computation ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 4 ◽  
Author(s):  
Håvard H. Holm ◽  
André R. Brodtkorb ◽  
Martin L. Sætra

In this work, we examine the performance, energy efficiency, and usability when using Python for developing high-performance computing codes running on the graphics processing unit (GPU). We investigate the portability of performance and energy efficiency between Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL); between GPU generations; and between low-end, mid-range, and high-end GPUs. Our findings showed that the impact of using Python is negligible for our applications, and furthermore, CUDA and OpenCL applications tuned to an equivalent level can in many cases obtain the same computational performance. Our experiments showed that performance in general varies more between different GPUs than between using CUDA and OpenCL. We also show that tuning for performance is a good way of tuning for energy efficiency, but that specific tuning is needed to obtain optimal energy efficiency.

Author(s):  
Sterling Ramroach ◽  
Jonathan Herbert ◽  
Ajay Joshi

Identifying important features from high dimensional data is usually done using one-dimensional filtering techniques. These techniques discard noisy attributes and those that are constant throughout the data. This is a time-consuming task that has scope for acceleration via high performance computing techniques involving the graphics processing unit (GPU). The proposed algorithm involves acceleration via the Compute Unified Device Architecture (CUDA) framework developed by Nvidia. This framework facilitates the seamless scaling of computation on any CUDA-enabled GPUs. Thus, the Pearson Correlation Coefficient can be applied in parallel on each feature with respect to the response variable. The ranks obtained for each feature can be used to determine the most relevant features to select. Using data from the UCI Machine Learning Repository, our results show an increase in efficiency for multi-dimensional analysis with a more reliable feature importance ranking. When tested on a high-dimensional dataset of 1000 samples and 10,000 features, we achieved a 1,230-time speedup using CUDA. This acceleration grows exponentially, as with any embarrassingly parallel task.


Author(s):  
Timothy Dykes ◽  
Claudio Gheller ◽  
Marzia Rivi ◽  
Mel Krokos

With the increasing size and complexity of data produced by large-scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous high-performance computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel’s coprocessor based upon the new many integrated core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally we compare performance against results achieved with the Graphics Processing Unit (GPU) based implementation of Splotch.


Author(s):  
Stefan Boodoo ◽  
Ajay Joshi

Oil and Gas companies keep exploring every new possible method to increase the likelihood of finding a commercial hydrocarbon bearing prospect. Well logging generates gigabytes of data from various probes and sensors. After processing, a prospective reservoir will indicate areas of oil, gas, water and reservoir rock. Incorporating High Performance Computing (HPC) methodologies will allow for thousands of potential wells to be indicative of its hydrocarbon bearing potential. This study will present the use of the Graphics Processing Unit (GPU) computing as another method of analyzing probable reserves. Raw well log data from the Kansas Geological Society (1999-2018) forms the basis of the data analysis. Parallel algorithms are developed and make use of Nvidia’s Compute Unified Device Architecture (CUDA). The results gathered highlight a 5 times speedup using a Nvidia GeForce GT 330M GPU as compared to an Intel Core i7 740QM Central Processing Unit (CPU). The processed results display depth wise areas of shale and rock formations as well as water, oil and/or gas reserves.


2017 ◽  
Vol 3 ◽  
pp. e127 ◽  
Author(s):  
Rory Mitchell ◽  
Eibe Frank

We present a CUDA-based implementation of a decision tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm is executed entirely on the graphics processing unit (GPU) and shows high performance with a variety of datasets and settings, including sparse input matrices. Individual boosting iterations are parallelised, combining two approaches. An interleaved approach is used for shallow trees, switching to a more conventional radix sort-based approach for larger depths. We show speedups of between 3× and 6× using a Titan X compared to a 4 core i7 CPU, and 1.2× using a Titan X compared to 2× Xeon CPUs (24 cores). We show that it is possible to process the Higgs dataset (10 million instances, 28 features) entirely within GPU memory. The algorithm is made available as a plug-in within the XGBoost library and fully supports all XGBoost features including classification, regression and ranking tasks.


2017 ◽  
Vol 14 (1) ◽  
pp. 789-795
Author(s):  
V Saveetha ◽  
S Sophia

Parallel data clustering aims at using algorithms and methods to extract knowledge from fat databases in rational time using high performance architectures. The computational challenge faced by cluster analysis due to increasing capacity of data can be overcome by exploiting the power of these architectures. The recent development in parallel power of Graphics Processing Unit enables low cost high performance solutions for general purpose applications. The Compute Unified Device Architecture programming model provides application programming interface methods to handle data proficiently on Graphics Processing Unit for iterative clustering algorithms like K-Means. The existing Graphics Processing Unit based K-Means algorithms highly focus on improvising the speedup of the algorithms and fall short to handle the high time spent on transfer of data between the Central Processing Unit and Graphics Processing Unit. A competent K-Means algorithm is proposed in this paper to lessen the transfer time by introducing a novel approach to check the convergence of the algorithm and utilize the pinned memory for direct access. This algorithm outperforms the other algorithms by maximizing parallelism and utilizing the memory features. The relative speedups and the validity measure for the proposed algorithm is elevated when compared with K-Means on Graphics Processing Unit and K-Means using Flag on Graphics Processing Unit. Thus the planned approach proves that communication overhead can be reduced in K-Means clustering.


Author(s):  
Baldomero Imbernón ◽  
Antonio Llanes ◽  
José-Matías Cutillas-Lozano ◽  
Domingo Giménez

Virtual screening (VS) methods aid clinical research by predicting the interaction of ligands with pharmacological targets. The computational requirements of VS, along with the size of the databases, propitiate the use of high-performance computing. METADOCK is a tool for the application of metaheuristics to VS in heterogeneous clusters of computers based on central processing unit (CPU) and graphics processing unit (GPU). HYPERDOCK represents a step forward; the exploration for satisfactory metaheuristics is systematically approached by means of hyperheuristics working on top of the metaheuristic schema of METADOCK. Multiple metaheuristics are explored, so the process is computationally demanding. HYPERDOCK exploits the parallelism of METADOCK and includes parallelism at its own level. The different levels of parallelism can be used to exploit the parallelism offered by computational systems composed of multicore CPU + multi-GPUs. The efficient exploitation of these systems enables HYPERDOCK to improve ligand–receptor binding.


Author(s):  
K. Bhargavi ◽  
Sathish Babu B.

The GPUs (Graphics Processing Unit) were mainly used to speed up computation intensive high performance computing applications. There are several tools and technologies available to perform general purpose computationally intensive application. This chapter primarily discusses about GPU parallelism, applications, probable challenges and also highlights some of the GPU computing platforms, which includes CUDA, OpenCL (Open Computing Language), OpenMPC (Open MP extended for CUDA), MPI (Message Passing Interface), OpenACC (Open Accelerator), DirectCompute, and C++ AMP (C++ Accelerated Massive Parallelism). Each of these platforms is discussed briefly along with their advantages and disadvantages.


2012 ◽  
Vol 10 (H16) ◽  
pp. 679-680
Author(s):  
Christopher J. Fluke

AbstractAs we move ever closer to the Square Kilometre Array era, support for real-time, interactive visualisation and analysis of tera-scale (and beyond) data cubes will be crucial for on-going knowledge discovery. However, the data-on-the-desktop approach to analysis and visualisation that most astronomers are comfortable with will no longer be feasible: tera-scale data volumes exceed the memory and processing capabilities of standard desktop computing environments. Instead, there will be an increasing need for astronomers to utilise remote high performance computing (HPC) resources. In recent years, the graphics processing unit (GPU) has emerged as a credible, low cost option for HPC. A growing number of supercomputing centres are now investing heavily in GPU technologies to provide O(100) Teraflop/s processing. I describe how a GPU-powered computing cluster allows us to overcome the analysis and visualisation challenges of tera-scale data. With a GPU-based architecture, we have moved the bottleneck from processing-limited to bandwidth-limited, achieving exceptional real-time performance for common visualisation and data analysis tasks.


Water ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 1288
Author(s):  
Yueling Wang ◽  
Xiaoliu Yang

To protect ecologies and the environment by preventing floods, analysis of the impact of climate change on water requires a tool capable of considering the rainfall-runoff processes on a small scale, for example, 10 m. As has been shown previously, hydrologic models are good at simulating rainfall-runoff processes on a large scale, e.g., over several hundred km2, while hydraulic models are more advantageous for applications on smaller scales. In order to take advantages of these two types of models, this paper coupled a hydrologic model, the Xinanjing model (XAJ), with a hydraulic model, the Graphics Processing Unit (GPU)-accelerated high-performance integrated hydraulic modelling system (HiPIMS). The study was completed in the Misai basin (797 km2), located in Zhejiang Province, China. The coupled XAJ–HiPIMS model was validated against observed flood events. The simulated results agree well with the data observed at the basin outlet. The study proves that a coupled hydrologic and hydraulic model is capable of providing flood information on a small scale for a large basin and shows the potential of the research.


Sign in / Sign up

Export Citation Format

Share Document