On the Effect of Exploiting GPUs for a More Eco-Sustainable Lease of Life

It has been estimated that about 2% of global carbon dioxide emissions can be attributed to IT systems. Green (or sustainable) computing refers to supporting business critical computing needs with the least possible amount of power. This phenomenon changes the priorities in the design of new software systems and in the way companies handle existing ones. In this paper, we present the results of a research project aimed to develop a migration strategy to give an existing software system a new and more eco-sustainable lease of life. We applied a strategy for migrating a subject system that performs intensive and massive computation to a target architecture based on a Graphics Processing Unit (GPU). We validated our solution on a system for path finding robot simulations. An analysis on execution time and energy consumption indicated that: (i) the execution time of the migrated system is less than the execution time of the original system; and (ii) the migrated system reduces energy waste, so suggesting that it is more eco-sustainable than its original version. Our findings improve the body of knowledge on the effect of using the GPU in green computing.

Download Full-text

Phase Retrieval Holography for Particle Measurement With GPU Acceleration

Volume 4: Fluid Measurement and Instrumentation; Micro and Nano Fluid Dynamics ◽

10.1115/ajkfluids2019-5204 ◽

2019 ◽

Author(s):

Yohsuke Tanaka ◽

Hiroki Matsushi ◽

Shigeru Murata

Keyword(s):

Operating System ◽

Execution Time ◽

Phase Retrieval ◽

Small Volume ◽

Graphics Processing Unit ◽

Gpu Acceleration ◽

Processing Unit ◽

Drastic Reduction ◽

Particle Measurement ◽

Graphics Processing

Abstract We introduce a graphics processing unit (GPU) acceleration to reconstructing holograms of phase retrieval holography for a drastic reduction of the execution time. We conducted GPU acceleration using the FFT library CUFFT on the GPU chip (GEFORCE GTX 1050, GDDR5 2GB, NVIDIA). We also used Intel Xeon CPU (E5-2690, 2.90GHz, Intel), the memory of 24 GB, and the operating system of Ubuntu 16.04 to compare GPU and CPU. Reconstructed volumes changed from 2562 × 128 voxels to 20482 × 1024 voxel to compare execution times. The ratio of the time of GPU to that of CPUs is constantly higher than 100 times except for small volume. We also demonstrated that GPU acceleration decreased the time by observing falling particles, recorded in 40 frames, from particle feeder. As a result, it is found that the execution time is reduced from 13 hours to 30 minutes.

Download Full-text

Performance Analysis of OpenCL and CUDA Programming Models for the High Efficiency Video Coding

10.5772/intechopen.99823 ◽

2021 ◽

Author(s):

Randa Khemiri ◽

Soulef Bouaafia ◽

Asma Bahba ◽

Maha Nasr ◽

Fatma Ezahra Sayadi

Keyword(s):

Motion Estimation ◽

Execution Time ◽

High Efficiency ◽

Graphics Processing Unit ◽

Block Matching ◽

Performance Ratio ◽

Processing Unit ◽

High Efficiency Video Coding ◽

Estimation Algorithms ◽

Cuda Programming

In Motion estimation (ME), the block matching algorithms have a great potential of parallelism. This process of the best match is performed by computing the similarity for each block position inside the search area, using a similarity metric, such as Sum of Absolute Differences (SAD). It is used in the various steps of motion estimation algorithms. Moreover, it can be parallelized using Graphics Processing Unit (GPU) since the computation algorithm of each block pixels is similar, thus offering better results. In this work a fixed OpenCL code was performed firstly on several architectures as CPU and GPU, secondly a parallel GPU-implementation was proposed with CUDA and OpenCL for the SAD process using block of sizes from 4x4 to 64x64. A comparative study established between execution time on GPU on the same video sequence. The experimental results indicated that GPU OpenCL execution time was better than that of CUDA times with performance ratio that reached the double.

Download Full-text

Multi-View Human Body Pose Estimation with CUDA-PSO

International Journal of Adaptive Resilient and Autonomic Systems ◽

10.4018/jaras.2012100104 ◽

2012 ◽

Vol 3 (4) ◽

pp. 51-65 ◽

Cited By ~ 1

Author(s):

Luca Mussi ◽

Spela Ivekovic ◽

Youssef S.G. Nashed ◽

Stefano Cagnoni

Keyword(s):

Pose Estimation ◽

Human Body ◽

Optimization Problem ◽

Graphics Processing Unit ◽

Fitness Function ◽

Optimization Method ◽

The Body ◽

Processing Unit ◽

Graphics Processing ◽

Body Pose Estimation

The authors formulate the body pose estimation as a multi-dimensional nonlinear optimization problem, suitable to be approximately solved by a meta-heuristic, specifically, the particle swarm optimization (PSO). Starting from multi-view video sequences acquired in a studio environment, a full skeletal configuration of the human body is retrieved. They use a generic subdivision-surface body model in 3-D to generate solutions for the optimization problem. PSO then looks for the best match between the silhouettes generated by the projection of the model in a candidate pose and the silhouettes extracted from the original video sequence. The optimization method, in this case PSO, is run in parallel on the Graphics Processing Unit (GPU) and is implemented in Cuda-C™ on the nVidia CUDA™ architecture. The authors compare the results obtained by different configurations of the camera setup, fitness function, and PSO neighborhood topologies.

Download Full-text

Models for estimating the time of program loop execution in parallel on a CPU and with the use of OpenCL computation on a GPU

AUTOBUSY – Technika Eksploatacja Systemy Transportowe ◽

10.24136/atest.2018.501 ◽

2018 ◽

Vol 19 (12) ◽

pp. 802-807

Author(s):

Łukasz Nozdrzykowski ◽

Magdalena Nozdrzykowska

Keyword(s):

Distribution System ◽

High Performance ◽

Graphics Processing Unit ◽

Time Estimation ◽

The Body ◽

Processing Unit ◽

Estimation Model ◽

Data Dependencies ◽

Task Distribution ◽

Graphics Processing

The authors present models for estimating the time of execution of program loops compliant with the FAN model with no data dependencies or with data dependencies only within the body programming loop, which can be executed either by CPUs or by stream multiprocessors referred to as GPU cores. The models presented will make it possible to determine whether it would be more efficient to execute computation in the existing environment using the CPU (Central Pro-cessing Unit) or a state-of-the-art graphics card with a high-performance GPU (Graphics Processing Unit) and super-fast memory, of-ten implemented in modern graphics cards. Validity checks confirming the developed time estimation model for GPU are presented. The purpose of these models is to provide methods for accelerating the performance of applications performing various tasks, including transport tasks, such as accelerated solution searching, searching paths in graphs, or accelerating image processing algorithms in vision systems of autonomous and semiautonomous vehicles, where these models allow to build an automatic task distribution system between the CPU and the GPU with the variability of computing resources.

Download Full-text

Fast iterative solvers for large compressed-sparse row linear systems on graphics processing unit

Pollack Periodica ◽

10.1556/pollack.10.2015.1.1 ◽

2015 ◽

Vol 10 (1) ◽

pp. 3-18 ◽

Cited By ~ 1

Author(s):

Frédéric Magoulès ◽

Abal-Kassim Cheik Ahamed ◽

Roman Putanowicz

Keyword(s):

Linear Systems ◽

Graphics Processing Unit ◽

Iterative Solvers ◽

Processing Unit ◽

Compressed Sparse Row ◽

Graphics Processing

Download Full-text

Performance Analysis and Optimization of Graphics Processing Unit

SSRN Electronic Journal ◽

10.2139/ssrn.3350249 ◽

2019 ◽

Author(s):

Lokendra Singh Umrao ◽

Jay Prakash Pandey

Keyword(s):

Performance Analysis ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Implementing wide baseline matching algorithms on a graphics processing unit.

10.2172/921737 ◽

2007 ◽

Author(s):

Fredrick H. Rothganger ◽

Kurt W. Larson ◽

Antonio Ignacio Gonzales ◽

Daniel S. Myers

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Wide Baseline Matching ◽

Graphics Processing

Download Full-text

Two Decades of 4D-QSAR: A Dying Art or Staging a Comeback?

International Journal of Molecular Sciences ◽

10.3390/ijms22105212 ◽

2021 ◽

Vol 22 (10) ◽

pp. 5212

Author(s):

Andrzej Bak

Keyword(s):

Molecular Conformation ◽

Graphics Processing Unit ◽

Processing Unit ◽

Diverse Range ◽

Current State ◽

Gpu Clusters ◽

Pharmacophore Hypothesis ◽

Rising Power ◽

Graphics Processing ◽

Ligand Conformation

A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.

Download Full-text