Modular Microservice based GPU Utilization Manager with Gunicorn

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i4.pp230-237 ◽

2020 ◽

pp. 230-237

Keyword(s):

Performance Monitoring ◽

Graphics Processing Unit ◽

Processing Unit ◽

New Era ◽

Central Processing ◽

Massively Parallel Computing ◽

Improved Performance ◽

Graphics Processing ◽

Mathematical Operations ◽

Speed Of Analysis

:Graphics processing unit (GPU) is a computer programmable chip that could perform rapid mathematical operations that can be accelerated with massive parallelism. In the early days, central processing unit (CPU) was responsible for all computations irrespective of whether it is feasible for parallel computation. However, in recent years GPUs are increasingly used for massively parallel computing applications, such as training Deep Neural Networks. GPU’s performance monitoring plays a key role in this new era since GPUs serve an inevitable role in increasing the speed of analysis of the developed system. GPU administration comes in picture to efficiently utilize the GPU when we deal with multiple workloads to run on the same hardware. In this study, various GPUparameters are monitored and help to keep them in safe levels and also to keep the improved performance of the system. This study,

Download Full-text

Numerical simulation of flattened heat pipe with double heat sources for CPU and GPU cooling application in laptop computers

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwaa091 ◽

2020 ◽

Author(s):

Wisoot Sanhan ◽

Kambiz Vafai ◽

Niti Kammuang-Lue ◽

Pradit Terdtoon ◽

Phrut Sakulchangsatjatai

Keyword(s):

Experimental Data ◽

Heat Pipe ◽

Graphics Processing Unit ◽

Processing Unit ◽

Heat Sources ◽

Final Thickness ◽

Laptop Computers ◽

Central Processing ◽

Graphics Processing ◽

Good Agreement

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.

Download Full-text

ALGORITHM OF SKELETON-BASED STATIC HAND GESTURE RECOGNITION

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2020.05.pp.013-022 ◽

2020 ◽

pp. 13-22

Author(s):

D. A. Kalina ◽

R. V. Golovanov ◽

D. V. Vorotnev

Keyword(s):

Gesture Recognition ◽

Graphics Processing Unit ◽

Recognition System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Processing Unit ◽

The Novel ◽

Central Processing ◽

Graphics Processing ◽

Artificial Network

We present the monocamera approach of static hand gestures recognition based on skeletonization. The problem of creating skeleton of the human’s hand, as well as body, became solvable a few years ago after inventing so called convolutional pose machines – the novel architecture of artificial neural network. Our solution uses such kind of pretrained convolutional artificial network for extracting hand joints keypoints with further skeleton reconstruction. In this work we also propose special skeleton descriptor with proving its stability and distinguishability in terms of classification. We considered a few widespread machine learning algorithms to build and verify different classifiers. The quality of the classifier’s recognition is estimated using the wellknown Accuracy metric, which identified that classical SVM (Support Vector Machines) with radial basis kernel gives the best results. The testing of the whole system was conducted using public databases containing about 3000 of test images for more than 10 types of gestures. The results of a comparative analysis of the proposed system with existing approaches are demonstrated. It is shown that our gesture recognition system provides better quality in comparison with existing solutions. The performance of the proposed system was estimated for two configurations of standard personal computer: with CPU (Central Processing Unit) only and with GPU (Graphics Processing Unit) in addition where the latest one provides realtime processing with up to 60 frames per second. Thus we demonstrate that the proposed approach can find an application in the practice.

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

Analysis of Fast Fourier Transformations algorithm for CUDA Architecture

Lietuvos matematikos rinkinys ◽

10.15388/lmr.b.2012.46 ◽

2012 ◽

Vol 53 ◽

Author(s):

Beatričė Andziulienė ◽

Evaldas Žulkas ◽

Audrius Kuprinavičius

Keyword(s):

Graphics Processing Unit ◽

General Purpose ◽

Fast Fourier Transformation ◽

Processing Unit ◽

Data Allocation ◽

Analysis Method ◽

Central Processing ◽

Execution Speed ◽

Cuda Architecture ◽

Graphics Processing

In this work Fast Fourier transformation algorithm for general purpose graphics processing unit processing (GPGPU) is discussed. Algorithm structure and individual stages performance were analysed. With performance analysis method algorithm distribution and data allocation possibilities were determined, depending on algorithm stages execution speed and algorithm structure. Ratio between CPU and GPU execution during Fast Fourier transform signal processing was determined using computer-generated data with frequency. When adopting CPU code for CUDA execution, it not becomes more complex, even if stream procesor parallelization and data transfering algorith stages are considered. But central processing unit serial execution).

Download Full-text

Paralelização do Algoritmo Floyd-Warshall usando GPU

10.5753/wscad.2013.16769 ◽

2013 ◽

Author(s):

Roussian R. A. Gaioso ◽

Walid A. R. Jradi ◽

Lauro C. M. de Paula ◽

Wanderley De S. Alencar ◽

Wellington S. Martins ◽

...

Keyword(s):

Graphics Processing Unit ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Graphics Processing

Este artigo apresenta uma implementação paralela baseada em Graphics Processing Unit (GPU) para o problema da identiﬁcação dos caminhos mínimos entre todos os pares de vértices em um grafo. A implementação é baseada no algoritmo Floyd-Warshall e tira o máximo proveito da arquitetura multithreaded das GPUs atuais. Nossa solução reduz a comunicação entre a Central Processing Unit (CPU) e a GPU, melhora a utilização dos Streaming Multiprocessors (SMs) e faz um uso intensivo de acesso aglutinado em memória para otimizar o acesso de dados do grafo. A vantagem da implementação proposta é demonstrada por vários grafos gerados aleatoriamente utilizando a ferramenta GTgraph. Grafos contendo milhares de vértices foram gerados e utilizados nos experimentos. Os resultados mostraram um excelente desempenho em diversos grafos, alcançando ganhos de até 149x, quando comparado com uma implementação sequencial, e superando implementações tradicionais por um fator de quase quatro vezes. Nossos resultados conﬁrmam que implementações baseadas em GPU podem ser viáveis mesmo para algoritmos de grafos cujo acessos à memória e distribuição de trabalho são irregulares e causam dependência de dados.

Download Full-text

GPU Accelerated PIC and SIC for OFDM-NOMA

Electronics ◽

10.3390/electronics8030257 ◽

2019 ◽

Vol 8 (3) ◽

pp. 257 ◽

Cited By ~ 2

Author(s):

Talgat Manglayev ◽

Refik Kizilirmak ◽

Nor Hamid

Keyword(s):

Interference Cancellation ◽

Orthogonal Frequency Division Multiplexing ◽

Multiple Access ◽

Graphics Processing Unit ◽

Processing Unit ◽

Parallel Interference Cancellation ◽

Central Processing ◽

Fifth Generation ◽

Access Scheme ◽

Graphics Processing

Non-orthogonal multiple access (NOMA) is a candidate multiple access scheme for the fifth-generation (5G) cellular networks. In NOMA systems, all users operate at the same frequency and time, which poses a challenge in the decoding process at the receiver side. In this work, the two most popular receiver structures, successive interference cancellation (SIC) and parallel interference cancellation (PIC) receivers, for NOMA reverse channel are implemented on a graphics processing unit (GPU) and compared. Orthogonal frequency division multiplexing (OFDM) is considered. The high computational complexity of interference cancellation receivers undermines the potential deployment of NOMA systems. GPU acceleration, however, challenges this weakness, and our numerical results show speedups of about from 75–220-times as compared to a multi-thread implementation on a central processing unit (CPU). SIC and PIC multi-thread execution time on different platforms reveals the potential of GPU in wireless communications. Furthermore, the successful decoding rates of the SIC and PIC are evaluated and compared in terms of bit error rate.

Download Full-text

Evaluating the computing efficiencies (specificity and sensitivity) of graphics processing unit (GPU)-accelerated DNA sequence alignment tools against central processing unit (CPU) alignment tool

Journal of Bioinformatics and Sequence Analysis ◽

10.5897/jbsa2018.0109 ◽

2018 ◽

Vol 9 (2) ◽

pp. 10-14 ◽

Cited By ~ 1

Author(s):

Pawar Shrikant ◽

Stanam Aditya ◽

Zhu Ying

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Graphics Processing Unit ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Dna Sequence Alignment ◽

Specificity And Sensitivity ◽

Alignment Tool ◽

Graphics Processing

Download Full-text

Parallel Ellipsoid Collision Detection With Applications in Contact Dynamics

Volume 3: 30th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2010-29073 ◽

2010 ◽

Cited By ~ 2

Author(s):

Arman Pazouki ◽

Hammad Mazhar ◽

Dan Negrut

Keyword(s):

Collision Detection ◽

Graphics Processing Unit ◽

Contact Dynamics ◽

Contact Detection ◽

Processing Unit ◽

Depth Of Penetration ◽

Central Processing ◽

Dynamics Of Multibody Systems ◽

Graphics Processing ◽

Contact Parameters

This work concentrates on the contact detection of ellipsoids, an enhancement to collision detection which can be used to study the dynamics of multibody systems with frictional contact. A first method for contact detection is posed as an unconstrained optimization problem. This method, while computationally demanding, can find the contact parameters as well as determine the state of the contact of two ellipsoids. Next, a method is presented that is approximately two orders of magnitudes more efficient in finding the contact state of two ellipsoids. However, it cannot find the contact parameters such as contact normal, depth of penetration, etc. Finally, a parallel algorithm for the ellipsoid contact detection problem is presented. The algorithm is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a speedup of up to 70× over a Central Processing Unit (CPU) based method. The proposed methodology is expected to have an impact in granular flow dynamics applications.

Download Full-text

Implementation of Membrane Algorithms on GPU

Journal of Applied Mathematics ◽

10.1155/2014/307617 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Xingyi Zhang ◽

Bangju Wang ◽

Zhuanlian Ding ◽

Jin Tang ◽

Juanjuan He

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Matching Problem ◽

Computing Device ◽

Central Processing ◽

New Class ◽

Intractable Problems ◽

Point Set ◽

Graphics Processing ◽

Gpu Implementation

Membrane algorithms are a new class of parallel algorithms, which attempt to incorporate some components of membrane computing models for designing efficient optimization algorithms, such as the structure of the models and the way of communication between cells. Although the importance of the parallelism of such algorithms has been well recognized, membrane algorithms were usually implemented on the serial computing device central processing unit (CPU), which makes the algorithms unable to work in an efficient way. In this work, we consider the implementation of membrane algorithms on the parallel computing device graphics processing unit (GPU). In such implementation, all cells of membrane algorithms can work simultaneously. Experimental results on two classical intractable problems, the point set matching problem and TSP, show that the GPU implementation of membrane algorithms is much more efficient than CPU implementation in terms of runtime, especially for solving problems with a high complexity.

Download Full-text

GPU Collision Detection Using Spatial Subdivision With Applications in Contact Dynamics

Volume 4: 7th International Conference on Multibody Systems, Nonlinear Dynamics, and Control, Parts A, B and C ◽

10.1115/detc2009-86366 ◽

2009 ◽

Author(s):

Hammad Mazhar

Keyword(s):

Collision Detection ◽

Graphics Processing Unit ◽

Detection Algorithm ◽

Contact Dynamics ◽

Processing Unit ◽

Central Processing ◽

Dynamics Of Multibody Systems ◽

Wide Range ◽

Particle Hydrodynamics ◽

Massively Parallel Computing

This work concentrates on the issue of rigid body collision detection, a critical component of any software package employed to approximate the dynamics of multibody systems with frictional contact. This paper presents a scalable collision detection algorithm designed for massively parallel computing architectures. The approach proposed is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a 40x speedup over state-of-the art Central Processing Unit (CPU) implementations when handling multi-million object collision detection. GPUs are composed of many (on the order of hundreds) scalar processors that can simultaneously execute an operation; this strength is leveraged in the proposed algorithm. The approach can detect collisions between five million objects in less than two seconds; with newer GPUs, the capability of detecting collisions between eighty million objects in less than thirty seconds is expected. The proposed methodology is expected to have an impact on a wide range of granular flow dynamics and smoothed particle hydrodynamics applications, e.g. sand, gravel and fluid simulations, where the number of contacts can reach into the hundreds of millions.

Download Full-text