scholarly journals Modular Microservice based GPU Utilization Manager with Gunicorn

:Graphics processing unit (GPU) is a computer programmable chip that could perform rapid mathematical operations that can be accelerated with massive parallelism. In the early days, central processing unit (CPU) was responsible for all computations irrespective of whether it is feasible for parallel computation. However, in recent years GPUs are increasingly used for massively parallel computing applications, such as training Deep Neural Networks. GPU’s performance monitoring plays a key role in this new era since GPUs serve an inevitable role in increasing the speed of analysis of the developed system. GPU administration comes in picture to efficiently utilize the GPU when we deal with multiple workloads to run on the same hardware. In this study, various GPUparameters are monitored and help to keep them in safe levels and also to keep the improved performance of the system. This study,

Author(s):  
Wisoot Sanhan ◽  
Kambiz Vafai ◽  
Niti Kammuang-Lue ◽  
Pradit Terdtoon ◽  
Phrut Sakulchangsatjatai

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.


Author(s):  
D. A. Kalina ◽  
R. V. Golovanov ◽  
D. V. Vorotnev

We present the monocamera approach of static hand gestures recognition based on skeletonization. The problem of creating skeleton of the human’s hand, as well as body, became solvable a few years ago after inventing so called convolutional pose machines – the novel architecture of artificial neural network. Our solution uses such kind of pretrained convolutional artificial network for extracting hand joints keypoints with further skeleton reconstruction. In this work we also propose special skeleton descriptor with proving its stability and distinguishability in terms of classification. We considered a few widespread machine learning algorithms to build and verify different classifiers. The quality of the classifier’s recognition is estimated using the wellknown Accuracy metric, which identified that classical SVM (Support Vector Machines) with radial basis kernel gives the best results. The testing of the whole system was conducted using public databases containing about 3000 of test images for more than 10 types of gestures. The results of a comparative analysis of the proposed system with existing approaches are demonstrated. It is shown that our gesture recognition system provides better quality in comparison with existing solutions. The performance of the proposed system was estimated for two configurations of standard personal computer: with CPU (Central Processing Unit) only and with GPU (Graphics Processing Unit) in addition where the latest one provides realtime processing with up to 60 frames per second. Thus we demonstrate that the proposed approach can find an application in the practice.


Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2012 ◽  
Vol 53 ◽  
Author(s):  
Beatričė Andziulienė ◽  
Evaldas Žulkas ◽  
Audrius Kuprinavičius

In this work Fast Fourier transformation algorithm for general purpose graphics processing unit processing (GPGPU) is discussed. Algorithm structure and individual stages performance were analysed. With performance analysis method algorithm distribution and data allocation possibilities were determined, depending on algorithm stages execution speed and algorithm structure. Ratio between CPU and GPU execution during Fast Fourier transform signal processing was determined using computer-generated data with frequency. When adopting CPU code for CUDA execution, it not becomes more complex, even if stream procesor parallelization and data transfering algorith stages are considered. But central processing unit serial execution).


2013 ◽  
Author(s):  
Roussian R. A. Gaioso ◽  
Walid A. R. Jradi ◽  
Lauro C. M. de Paula ◽  
Wanderley De S. Alencar ◽  
Wellington S. Martins ◽  
...  

Este artigo apresenta uma implementação paralela baseada em Graphics Processing Unit (GPU) para o problema da identificação dos caminhos mínimos entre todos os pares de vértices em um grafo. A implementação é baseada no algoritmo Floyd-Warshall e tira o máximo proveito da arquitetura multithreaded das GPUs atuais. Nossa solução reduz a comunicação entre a Central Processing Unit (CPU) e a GPU, melhora a utilização dos Streaming Multiprocessors (SMs) e faz um uso intensivo de acesso aglutinado em memória para otimizar o acesso de dados do grafo. A vantagem da implementação proposta é demonstrada por vários grafos gerados aleatoriamente utilizando a ferramenta GTgraph. Grafos contendo milhares de vértices foram gerados e utilizados nos experimentos. Os resultados mostraram um excelente desempenho em diversos grafos, alcançando ganhos de até 149x, quando comparado com uma implementação sequencial, e superando implementações tradicionais por um fator de quase quatro vezes. Nossos resultados confirmam que implementações baseadas em GPU podem ser viáveis mesmo para algoritmos de grafos cujo acessos à memória e distribuição de trabalho são irregulares e causam dependência de dados.


Electronics ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 257 ◽  
Author(s):  
Talgat Manglayev ◽  
Refik Kizilirmak ◽  
Nor Hamid

Non-orthogonal multiple access (NOMA) is a candidate multiple access scheme for the fifth-generation (5G) cellular networks. In NOMA systems, all users operate at the same frequency and time, which poses a challenge in the decoding process at the receiver side. In this work, the two most popular receiver structures, successive interference cancellation (SIC) and parallel interference cancellation (PIC) receivers, for NOMA reverse channel are implemented on a graphics processing unit (GPU) and compared. Orthogonal frequency division multiplexing (OFDM) is considered. The high computational complexity of interference cancellation receivers undermines the potential deployment of NOMA systems. GPU acceleration, however, challenges this weakness, and our numerical results show speedups of about from 75–220-times as compared to a multi-thread implementation on a central processing unit (CPU). SIC and PIC multi-thread execution time on different platforms reveals the potential of GPU in wireless communications. Furthermore, the successful decoding rates of the SIC and PIC are evaluated and compared in terms of bit error rate.


Author(s):  
Arman Pazouki ◽  
Hammad Mazhar ◽  
Dan Negrut

This work concentrates on the contact detection of ellipsoids, an enhancement to collision detection which can be used to study the dynamics of multibody systems with frictional contact. A first method for contact detection is posed as an unconstrained optimization problem. This method, while computationally demanding, can find the contact parameters as well as determine the state of the contact of two ellipsoids. Next, a method is presented that is approximately two orders of magnitudes more efficient in finding the contact state of two ellipsoids. However, it cannot find the contact parameters such as contact normal, depth of penetration, etc. Finally, a parallel algorithm for the ellipsoid contact detection problem is presented. The algorithm is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a speedup of up to 70× over a Central Processing Unit (CPU) based method. The proposed methodology is expected to have an impact in granular flow dynamics applications.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Xingyi Zhang ◽  
Bangju Wang ◽  
Zhuanlian Ding ◽  
Jin Tang ◽  
Juanjuan He

Membrane algorithms are a new class of parallel algorithms, which attempt to incorporate some components of membrane computing models for designing efficient optimization algorithms, such as the structure of the models and the way of communication between cells. Although the importance of the parallelism of such algorithms has been well recognized, membrane algorithms were usually implemented on the serial computing device central processing unit (CPU), which makes the algorithms unable to work in an efficient way. In this work, we consider the implementation of membrane algorithms on the parallel computing device graphics processing unit (GPU). In such implementation, all cells of membrane algorithms can work simultaneously. Experimental results on two classical intractable problems, the point set matching problem and TSP, show that the GPU implementation of membrane algorithms is much more efficient than CPU implementation in terms of runtime, especially for solving problems with a high complexity.


Author(s):  
Hammad Mazhar

This work concentrates on the issue of rigid body collision detection, a critical component of any software package employed to approximate the dynamics of multibody systems with frictional contact. This paper presents a scalable collision detection algorithm designed for massively parallel computing architectures. The approach proposed is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a 40x speedup over state-of-the art Central Processing Unit (CPU) implementations when handling multi-million object collision detection. GPUs are composed of many (on the order of hundreds) scalar processors that can simultaneously execute an operation; this strength is leveraged in the proposed algorithm. The approach can detect collisions between five million objects in less than two seconds; with newer GPUs, the capability of detecting collisions between eighty million objects in less than thirty seconds is expected. The proposed methodology is expected to have an impact on a wide range of granular flow dynamics and smoothed particle hydrodynamics applications, e.g. sand, gravel and fluid simulations, where the number of contacts can reach into the hundreds of millions.


Sign in / Sign up

Export Citation Format

Share Document