On the issue of fuzzy timing estimations of the algorithms running at GPU and CPU architectures

We consider the task of comparing fuzzy estimates of the execution parameters of genetic algorithms implemented at GPU (graphics processing unit’ GPU) and CPU (central processing unit) architectures. Fuzzy estimates are calculated based on the averaged dependencies of the genetic algorithms running time at GPU and CPU architectures from the number of individuals in the populations processed by the algorithm. The analysis of the averaged dependences of the genetic algorithms running time at GPU and CPU-architectures showed that it is possible to process 10’000 chromosomes at GPU-architecture or 5’000 chromosomes at CPUarchitecture by genetic algorithm in approximately 2’500 ms. The following is correct for the cases under consideration: “Genetic algorithms (GA) are performed in approximately 2, 500 ms (on average), ” and a sections of fuzzy sets, with a = 0.5, correspond to the intervals [2, 000.2399] for GA performed at the GPU-architecture, and [1, 400.1799] for GA performed at the CPU-architecture. Thereby, it can be said that in this case, the actual execution time of the algorithm at the GPU architecture deviates in a lesser extent from the average value than at the CPU.

Download Full-text

Efficient Prefix Scan for the GPU-Based Implementation of Random Forest

Advances in Social Networking and Online Communities - Handbook of Research on Interactive Information Quality in Expanding Social Network Communications ◽

10.4018/978-1-4666-7377-9.ch009 ◽

2015 ◽

pp. 140-151

Author(s):

Bojan Novak

Keyword(s):

Random Forest ◽

Graphics Processing Unit ◽

Processing Unit ◽

Random Forest Algorithm ◽

Central Processing ◽

Split Point ◽

Parallel Scan ◽

Graphics Processing ◽

Gpu Architecture ◽

Gpu Implementation

The random forest ensemble learning with the Graphics Processing Unit (GPU) version of prefix scan method is presented. The efficiency of the implementation of the random forest algorithm depends critically on the scan (prefix sum) algorithm. The prefix scan is used in the depth-first implementation of optimal split point computation. Described are different implementations of the prefix scan algorithms. The speeds of the algorithms depend on three factors: the algorithm itself, which could be improved, the programming skills, and the compiler. In parallel environments, things are even more complicated and depend on the programmer´s knowledge of the Central Processing Unit (CPU) or the GPU architecture. An efficient parallel scan algorithm that avoids bank conflicts is crucial for the prefix scan implementation. In our tests, multicore CPU and GPU implementation based on NVIDIA´s CUDA is compared.

Download Full-text

History and Evolution of GPU Architecture

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Emerging Research Surrounding Power Consumption and Performance Issues in Utility Computing ◽

10.4018/978-1-4666-8853-7.ch006 ◽

2016 ◽

pp. 109-135

Author(s):

Prashanta Kumar Das ◽

Ganesh Chandra Deka

Keyword(s):

Image Processing ◽

Graphics Processing Unit ◽

Hardware Architecture ◽

Central Processing Unit ◽

Processing Unit ◽

3D Image ◽

Central Processing ◽

Graphics Processing ◽

Graphics Engine ◽

Gpu Architecture

The Graphics Processing Unit (GPU) is a specialized and highly parallel microprocessor designed to offload 2D/3D image from the Central Processing Unit (CPU) to expedite image processing. The modern GPU is not only a powerful graphics engine, but also a parallel programmable processor with high precision and powerful features. It is forcasted that by 2020, 48 Core GPU will be available while by 2030 GPU with 3000 core is likely to be available.This chapter describes the chronology of evolution of GPU hardware architecture and the future ahead.

Download Full-text

Graphics processing unit acceleration of the island model genetic algorithm using the CUDA programming platform

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6286 ◽

2021 ◽

Author(s):

Dylan M. Janssen ◽

Wayne Pullan ◽

Alan Wee‐Chung Liew

Keyword(s):

Genetic Algorithm ◽

Graphics Processing Unit ◽

Island Model ◽

Processing Unit ◽

Cuda Programming ◽

Graphics Processing

Download Full-text

Numerical simulation of flattened heat pipe with double heat sources for CPU and GPU cooling application in laptop computers

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwaa091 ◽

2020 ◽

Author(s):

Wisoot Sanhan ◽

Kambiz Vafai ◽

Niti Kammuang-Lue ◽

Pradit Terdtoon ◽

Phrut Sakulchangsatjatai

Keyword(s):

Experimental Data ◽

Heat Pipe ◽

Graphics Processing Unit ◽

Processing Unit ◽

Heat Sources ◽

Final Thickness ◽

Laptop Computers ◽

Central Processing ◽

Graphics Processing ◽

Good Agreement

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.

Download Full-text

ALGORITHM OF SKELETON-BASED STATIC HAND GESTURE RECOGNITION

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2020.05.pp.013-022 ◽

2020 ◽

pp. 13-22

Author(s):

D. A. Kalina ◽

R. V. Golovanov ◽

D. V. Vorotnev

Keyword(s):

Gesture Recognition ◽

Graphics Processing Unit ◽

Recognition System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Processing Unit ◽

The Novel ◽

Central Processing ◽

Graphics Processing ◽

Artificial Network

We present the monocamera approach of static hand gestures recognition based on skeletonization. The problem of creating skeleton of the human’s hand, as well as body, became solvable a few years ago after inventing so called convolutional pose machines – the novel architecture of artificial neural network. Our solution uses such kind of pretrained convolutional artificial network for extracting hand joints keypoints with further skeleton reconstruction. In this work we also propose special skeleton descriptor with proving its stability and distinguishability in terms of classification. We considered a few widespread machine learning algorithms to build and verify different classifiers. The quality of the classifier’s recognition is estimated using the wellknown Accuracy metric, which identified that classical SVM (Support Vector Machines) with radial basis kernel gives the best results. The testing of the whole system was conducted using public databases containing about 3000 of test images for more than 10 types of gestures. The results of a comparative analysis of the proposed system with existing approaches are demonstrated. It is shown that our gesture recognition system provides better quality in comparison with existing solutions. The performance of the proposed system was estimated for two configurations of standard personal computer: with CPU (Central Processing Unit) only and with GPU (Graphics Processing Unit) in addition where the latest one provides realtime processing with up to 60 frames per second. Thus we demonstrate that the proposed approach can find an application in the practice.

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

Analysis of Fast Fourier Transformations algorithm for CUDA Architecture

Lietuvos matematikos rinkinys ◽

10.15388/lmr.b.2012.46 ◽

2012 ◽

Vol 53 ◽

Author(s):

Beatričė Andziulienė ◽

Evaldas Žulkas ◽

Audrius Kuprinavičius

Keyword(s):

Graphics Processing Unit ◽

General Purpose ◽

Fast Fourier Transformation ◽

Processing Unit ◽

Data Allocation ◽

Analysis Method ◽

Central Processing ◽

Execution Speed ◽

Cuda Architecture ◽

Graphics Processing

In this work Fast Fourier transformation algorithm for general purpose graphics processing unit processing (GPGPU) is discussed. Algorithm structure and individual stages performance were analysed. With performance analysis method algorithm distribution and data allocation possibilities were determined, depending on algorithm stages execution speed and algorithm structure. Ratio between CPU and GPU execution during Fast Fourier transform signal processing was determined using computer-generated data with frequency. When adopting CPU code for CUDA execution, it not becomes more complex, even if stream procesor parallelization and data transfering algorith stages are considered. But central processing unit serial execution).

Download Full-text

Paralelização do Algoritmo Floyd-Warshall usando GPU

10.5753/wscad.2013.16769 ◽

2013 ◽

Author(s):

Roussian R. A. Gaioso ◽

Walid A. R. Jradi ◽

Lauro C. M. de Paula ◽

Wanderley De S. Alencar ◽

Wellington S. Martins ◽

...

Keyword(s):

Graphics Processing Unit ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Graphics Processing

Este artigo apresenta uma implementação paralela baseada em Graphics Processing Unit (GPU) para o problema da identiﬁcação dos caminhos mínimos entre todos os pares de vértices em um grafo. A implementação é baseada no algoritmo Floyd-Warshall e tira o máximo proveito da arquitetura multithreaded das GPUs atuais. Nossa solução reduz a comunicação entre a Central Processing Unit (CPU) e a GPU, melhora a utilização dos Streaming Multiprocessors (SMs) e faz um uso intensivo de acesso aglutinado em memória para otimizar o acesso de dados do grafo. A vantagem da implementação proposta é demonstrada por vários grafos gerados aleatoriamente utilizando a ferramenta GTgraph. Grafos contendo milhares de vértices foram gerados e utilizados nos experimentos. Os resultados mostraram um excelente desempenho em diversos grafos, alcançando ganhos de até 149x, quando comparado com uma implementação sequencial, e superando implementações tradicionais por um fator de quase quatro vezes. Nossos resultados conﬁrmam que implementações baseadas em GPU podem ser viáveis mesmo para algoritmos de grafos cujo acessos à memória e distribuição de trabalho são irregulares e causam dependência de dados.

Download Full-text

Modular Microservice based GPU Utilization Manager with Gunicorn

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i4.pp230-237 ◽

2020 ◽

pp. 230-237

Keyword(s):

Performance Monitoring ◽

Graphics Processing Unit ◽

Processing Unit ◽

New Era ◽

Central Processing ◽

Massively Parallel Computing ◽

Improved Performance ◽

Graphics Processing ◽

Mathematical Operations ◽

Speed Of Analysis

:Graphics processing unit (GPU) is a computer programmable chip that could perform rapid mathematical operations that can be accelerated with massive parallelism. In the early days, central processing unit (CPU) was responsible for all computations irrespective of whether it is feasible for parallel computation. However, in recent years GPUs are increasingly used for massively parallel computing applications, such as training Deep Neural Networks. GPU’s performance monitoring plays a key role in this new era since GPUs serve an inevitable role in increasing the speed of analysis of the developed system. GPU administration comes in picture to efficiently utilize the GPU when we deal with multiple workloads to run on the same hardware. In this study, various GPUparameters are monitored and help to keep them in safe levels and also to keep the improved performance of the system. This study,

Download Full-text

GPU Accelerated PIC and SIC for OFDM-NOMA

Electronics ◽

10.3390/electronics8030257 ◽

2019 ◽

Vol 8 (3) ◽

pp. 257 ◽

Cited By ~ 2

Author(s):

Talgat Manglayev ◽

Refik Kizilirmak ◽

Nor Hamid

Keyword(s):

Interference Cancellation ◽

Orthogonal Frequency Division Multiplexing ◽

Multiple Access ◽

Graphics Processing Unit ◽

Processing Unit ◽

Parallel Interference Cancellation ◽

Central Processing ◽

Fifth Generation ◽

Access Scheme ◽

Graphics Processing

Non-orthogonal multiple access (NOMA) is a candidate multiple access scheme for the fifth-generation (5G) cellular networks. In NOMA systems, all users operate at the same frequency and time, which poses a challenge in the decoding process at the receiver side. In this work, the two most popular receiver structures, successive interference cancellation (SIC) and parallel interference cancellation (PIC) receivers, for NOMA reverse channel are implemented on a graphics processing unit (GPU) and compared. Orthogonal frequency division multiplexing (OFDM) is considered. The high computational complexity of interference cancellation receivers undermines the potential deployment of NOMA systems. GPU acceleration, however, challenges this weakness, and our numerical results show speedups of about from 75–220-times as compared to a multi-thread implementation on a central processing unit (CPU). SIC and PIC multi-thread execution time on different platforms reveals the potential of GPU in wireless communications. Furthermore, the successful decoding rates of the SIC and PIC are evaluated and compared in terms of bit error rate.

Download Full-text