gpu computing
Recently Published Documents


TOTAL DOCUMENTS

438
(FIVE YEARS 87)

H-INDEX

27
(FIVE YEARS 5)

Energies ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 474
Author(s):  
Dong-Ki Kang ◽  
Ki-Beom Lee ◽  
Young-Chon Kim

Expanding the scale of GPU-based deep learning (DL) clusters would bring not only accelerated AI services but also significant energy consumption costs. In this paper, we propose a cost efficient deep learning job allocation (CE-DLA) approach minimizing the energy consumption cost for the DL cluster operation while guaranteeing the performance requirements of user requests. To do this, we first categorize the DL jobs into two classes: training jobs and inference jobs. Through the architecture-agnostic modeling, our CE-DLA approach is able to conduct the delicate mapping of heterogeneous DL jobs to GPU computing nodes. Second, we design the electricity price-aware DL job allocation so as to minimize the energy consumption cost of the cluster. We show that our approach efficiently avoids the peak-rate time slots of the GPU computing nodes by using the sophisticated mixed-integer nonlinear problem (MINLP) formulation. We additionally integrate the dynamic right-sizing (DRS) method with our CE-DLA approach, so as to minimize the energy consumption of idle nodes having no running job. In order to investigate the realistic behavior of our approach, we measure the actual output from the NVIDIA-based GPU devices with well-known deep neural network (DNN) models. Given the real trace data of the electricity price, we show that the CE-DLA approach outperforms the competitors in views of both the energy consumption cost and the performance for DL job processing.


2021 ◽  
pp. 106-109
Author(s):  
Denis Kravchuk

The use of optical contrast between different blood particles allows the use of optoacoustic imaging to visualize the distribution of blood particles (erythrocytes, taking into account oxygen saturation), the delivery of drugs to organs through blood vessels. An algorithm for calculating the ultrasonic field obtained as a result of optoacoustic interaction has been developed to speed up calculations on the GPU board. An architecture for fast restoration of an optoacoustic signal based on graphics processing unit (GPU) programming is proposed. The algorithm used in combination with the pre-migration method provides an improvement in the resolution and sharpness of the optoacoustic image of the simulated biological tissues. Thanks to the advanced graphics processing unit (GPU) computing architecture, time-consuming main processing unit (CPU) computing is accelerated with great computational efficiency.


2021 ◽  
Author(s):  
Bin Yang ◽  
William Miller

Tissue perfusion properties reveal crucial information pertinent to clinical diagnosis and treatment. Multispectral spatial frequency domain imaging (SFDI) is an emerging imaging technique that has been widely used to quantify tissue perfusion properties. However, slow processing speed limits its usefulness in real-time imaging applications. In this study, we present a two-stage look-up table (LUT) approach that accurately and rapidly quantifies optical (absorption and reduced scattering maps) and perfusion (total hemoglobin and oxygen saturation maps) properties using stage-1 and stage-2 LUTs, respectively, based on reflectance images at 660nm and 850nm. The two-stage LUT can be implemented on both CPU and GPU computing platforms. Quantifying tissue perfusion properties using the simulated diffuse reflectance images, we achieved a quantification speed of 266, 174, and 74 frames per second for three image sizes 512x512, 1024x1024, and 2048x2048 pixels, respectively. Quantification of tissue perfusion properties was highly accurate with only 3.5% and 2.5% error for total hemoglobin and oxygen saturation quantification, respectively. The two-stage LUT has the potential to be adopted in existing SFDI applications to enable real-time imaging capability of tissue hemodynamics.


Author(s):  
A.V. Goncharsky ◽  
S.Y. Romanov ◽  
S.Y. Seryozhnikov

This paper is concerned with implementation of wave tomography algorithms on modern SIMD CPU and GPU computing platforms. The field of wave tomography, which is currently under development, requires powerful computing resources. Main applications of wave tomography are medical imaging, nondestructive testing, seismic studies. Practical applications depend on computing hardware. Tomographic image reconstruction via wave tomography technique involves solving coefficient inverse problems for the wave equation. Such problems can be solved using iterative gradient-based methods, which rely on repeated numerical simulation of wave propagation process. In this study, finite-difference time-domain (FDTD) method is employed for wave simulation. This paper discusses software implementation of the algorithms and compares the performance of various computing devices: multi-core Intel and ARM-based CPUs, NVidia graphics processors. В данной статье рассматривается реализация алгоритмов волновой томографии на современных вычислительных платформах SIMD CPU и GPU. Область волновой томографии, которая в настоящее время находится в стадии разработки, требует мощных вычислительных ресурсов. Основные области применения волновой томографии - это медицинская визуализация, неразрушающий контроль, сейсмические исследования. Практические приложения зависят от вычислительного оборудования. Восстановление томографического изображения методом волновой томографии включает решение коэффициентов обратной задачи для волнового уравнения. Такие проблемы могут быть решены с помощью итерационных градиентных методов, основанных на многократном численном моделировании процесса распространения волн. В этом исследовании для моделирования волн используется метод конечных разностей во временной области (FDTD). В статье обсуждается программная реализация алгоритмов и сравнивается производительность различных вычислительных устройств: многоядерных процессоров Intel и ARM, графических процессоров NVidia.


2021 ◽  
Author(s):  
Zhenyun Tang ◽  
Xiaohui Dong ◽  
Zhenbao Li ◽  
Xiuli Du

Abstract With combination of physical experiment and numerical simulation, real-time hybrid simulation (RTHS) can enlarge the dimensions of testing specimens and improve the testing accuracy. However, due to the limitation of computing capacity, the maximum degrees of freedom for numerical substructure are less than 7000 from the reported RTHS testing. It cannot meet the testing requirements for evaluating the dynamic performance of large and complex engineering structures. Taking advantages of parallel computing toolbox (PCT) in Matlab and high-performance computing of graphics processing unit (GPU). A RTHS framework based on MATLAB and GPU was established in this work. Using this framework, a soil-structure interaction system (SSI) was tested by a shaking table based RTHS. Meanwhile, the dynamic response of this SSI system was simulated by finite element analysis. The comparison of simulation and testing results demonstrated that the proposed testing framework can implement RTHS testing successfully. Using this method, the maximum degrees of freedom for numerical substructure can reach to 27,000, which significantly enhance the testing capacity of RTHS testing for large and complex engineering structures.


2021 ◽  
Vol 150 (4) ◽  
pp. A94-A94
Author(s):  
Connor N. Kaplan ◽  
Jack D. Gabriel ◽  
Adrien David-Sivelle ◽  
Whitney L. Coyle
Keyword(s):  

Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5916
Author(s):  
Diego Romano ◽  
Marco Lapegna

Image Coregistration for InSAR processing is a time-consuming procedure that is usually processed in batch mode. With the availability of low-energy GPU accelerators, processing at the edge is now a promising perspective. Starting from the individuation of the most computationally intensive kernels from existing algorithms, we decomposed the cross-correlation problem from a multilevel point of view, intending to design and implement an efficient GPU-parallel algorithm for multiple settings, including the edge computing one. We analyzed the accuracy and performance of the proposed algorithm—also considering power efficiency—and its applicability to the identified settings. Results show that a significant speedup of InSAR processing is possible by exploiting GPU computing in different scenarios with no loss of accuracy, also enabling onboard processing using SoC hardware.


Author(s):  
Igor Sfiligoi ◽  
Shava Smallen ◽  
Frank Wurthwein ◽  
Nicole Wolter ◽  
David Schultz ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document