gpu computing Latest Research Papers

Expanding the scale of GPU-based deep learning (DL) clusters would bring not only accelerated AI services but also significant energy consumption costs. In this paper, we propose a cost efficient deep learning job allocation (CE-DLA) approach minimizing the energy consumption cost for the DL cluster operation while guaranteeing the performance requirements of user requests. To do this, we first categorize the DL jobs into two classes: training jobs and inference jobs. Through the architecture-agnostic modeling, our CE-DLA approach is able to conduct the delicate mapping of heterogeneous DL jobs to GPU computing nodes. Second, we design the electricity price-aware DL job allocation so as to minimize the energy consumption cost of the cluster. We show that our approach efficiently avoids the peak-rate time slots of the GPU computing nodes by using the sophisticated mixed-integer nonlinear problem (MINLP) formulation. We additionally integrate the dynamic right-sizing (DRS) method with our CE-DLA approach, so as to minimize the energy consumption of idle nodes having no running job. In order to investigate the realistic behavior of our approach, we measure the actual output from the NVIDIA-based GPU devices with well-known deep neural network (DNN) models. Given the real trace data of the electricity price, we show that the CE-DLA approach outperforms the competitors in views of both the energy consumption cost and the performance for DL job processing.

Download Full-text

Surfing Chaotic Perturbations in Interplanetary Multi-Flyby Trajectories: Augmented Picard-Chebyshev Integration for Parallel and GPU Computing Architectures

10.2514/6.2022-1275 ◽

2022 ◽

Author(s):

Alessandro Masat ◽

Camilla Colombo ◽

Arnaud Boutonnet

Keyword(s):

Gpu Computing

Download Full-text

Construction of an optoacoustic image of biological tissues based on an algorithm for a graphics processor

Applied Physics ◽

10.51368/1996-0948-2021-5-106-109 ◽

2021 ◽

pp. 106-109

Author(s):

Denis Kravchuk

Keyword(s):

Gpu Computing ◽

Graphics Processing Unit ◽

Biological Tissues ◽

Ultrasonic Field ◽

Processing Unit ◽

Optoacoustic Imaging ◽

Optoacoustic Interaction ◽

Speed Up ◽

Migration Method ◽

Graphics Processing

The use of optical contrast between different blood particles allows the use of optoacoustic imaging to visualize the distribution of blood particles (erythrocytes, taking into account oxygen saturation), the delivery of drugs to organs through blood vessels. An algorithm for calculating the ultrasonic field obtained as a result of optoacoustic interaction has been developed to speed up calculations on the GPU board. An architecture for fast restoration of an optoacoustic signal based on graphics processing unit (GPU) programming is proposed. The algorithm used in combination with the pre-migration method provides an improvement in the resolution and sharpness of the optoacoustic image of the simulated biological tissues. Thanks to the advanced graphics processing unit (GPU) computing architecture, time-consuming main processing unit (CPU) computing is accelerated with great computational efficiency.

Download Full-text

Rapid quantification of tissue perfusion properties with a two-stage look-up table: a simulation study

10.1101/2021.11.04.467306 ◽

2021 ◽

Author(s):

Bin Yang ◽

William Miller

Keyword(s):

Oxygen Saturation ◽

Real Time ◽

Gpu Computing ◽

Tissue Perfusion ◽

Two Stage ◽

Real Time Imaging ◽

Look Up Table ◽

Total Hemoglobin ◽

Computing Platforms ◽

Stage 1

Tissue perfusion properties reveal crucial information pertinent to clinical diagnosis and treatment. Multispectral spatial frequency domain imaging (SFDI) is an emerging imaging technique that has been widely used to quantify tissue perfusion properties. However, slow processing speed limits its usefulness in real-time imaging applications. In this study, we present a two-stage look-up table (LUT) approach that accurately and rapidly quantifies optical (absorption and reduced scattering maps) and perfusion (total hemoglobin and oxygen saturation maps) properties using stage-1 and stage-2 LUTs, respectively, based on reflectance images at 660nm and 850nm. The two-stage LUT can be implemented on both CPU and GPU computing platforms. Quantifying tissue perfusion properties using the simulated diffuse reflectance images, we achieved a quantification speed of 266, 174, and 74 frames per second for three image sizes 512x512, 1024x1024, and 2048x2048 pixels, respectively. Quantification of tissue perfusion properties was highly accurate with only 3.5% and 2.5% error for total hemoglobin and oxygen saturation quantification, respectively. The two-stage LUT has the potential to be adopted in existing SFDI applications to enable real-time imaging capability of tissue hemodynamics.

Download Full-text

Реализация и производительность алгоритмов волновой томографии на вычислительных платформах SIMD CPU и GPU

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v22r421 ◽

2021 ◽

pp. 322-332

Author(s):

A.V. Goncharsky ◽

S.Y. Romanov ◽

S.Y. Seryozhnikov

Keyword(s):

Gpu Computing ◽

Fdtd Method ◽

Graphics Processors ◽

Practical Applications ◽

Tomographic Image Reconstruction ◽

Coefficient Inverse Problems ◽

Gradient Based ◽

Wave Tomography ◽

Computing Platforms ◽

Difference Time

This paper is concerned with implementation of wave tomography algorithms on modern SIMD CPU and GPU computing platforms. The field of wave tomography, which is currently under development, requires powerful computing resources. Main applications of wave tomography are medical imaging, nondestructive testing, seismic studies. Practical applications depend on computing hardware. Tomographic image reconstruction via wave tomography technique involves solving coefficient inverse problems for the wave equation. Such problems can be solved using iterative gradient-based methods, which rely on repeated numerical simulation of wave propagation process. In this study, finite-difference time-domain (FDTD) method is employed for wave simulation. This paper discusses software implementation of the algorithms and compares the performance of various computing devices: multi-core Intel and ARM-based CPUs, NVidia graphics processors. В данной статье рассматривается реализация алгоритмов волновой томографии на современных вычислительных платформах SIMD CPU и GPU. Область волновой томографии, которая в настоящее время находится в стадии разработки, требует мощных вычислительных ресурсов. Основные области применения волновой томографии - это медицинская визуализация, неразрушающий контроль, сейсмические исследования. Практические приложения зависят от вычислительного оборудования. Восстановление томографического изображения методом волновой томографии включает решение коэффициентов обратной задачи для волнового уравнения. Такие проблемы могут быть решены с помощью итерационных градиентных методов, основанных на многократном численном моделировании процесса распространения волн. В этом исследовании для моделирования волн используется метод конечных разностей во временной области (FDTD). В статье обсуждается программная реализация алгоритмов и сравнивается производительность различных вычислительных устройств: многоядерных процессоров Intel и ARM, графических процессоров NVidia.

Download Full-text

Relaxed Replication for Energy Efficient and Resilient GPU Computing

10.1109/ftxs54580.2021.00009 ◽

2021 ◽

Author(s):

Zheng Miao ◽

Jon C. Calhoun ◽

Rong Ge

Keyword(s):

Energy Efficient ◽

Gpu Computing

Download Full-text

Implementation of Real Time Hybrid Simulation Based On GPU Computing

10.21203/rs.3.rs-596198/v2 ◽

2021 ◽

Author(s):

Zhenyun Tang ◽

Xiaohui Dong ◽

Zhenbao Li ◽

Xiuli Du

Keyword(s):

Real Time ◽

Degrees Of Freedom ◽

Gpu Computing ◽

Dynamic Performance ◽

Hybrid Simulation ◽

Shaking Table ◽

Processing Unit ◽

Engineering Structures ◽

Element Analysis ◽

Complex Engineering

Abstract With combination of physical experiment and numerical simulation, real-time hybrid simulation (RTHS) can enlarge the dimensions of testing specimens and improve the testing accuracy. However, due to the limitation of computing capacity, the maximum degrees of freedom for numerical substructure are less than 7000 from the reported RTHS testing. It cannot meet the testing requirements for evaluating the dynamic performance of large and complex engineering structures. Taking advantages of parallel computing toolbox (PCT) in Matlab and high-performance computing of graphics processing unit (GPU). A RTHS framework based on MATLAB and GPU was established in this work. Using this framework, a soil-structure interaction system (SSI) was tested by a shaking table based RTHS. Meanwhile, the dynamic response of this SSI system was simulated by finite element analysis. The comparison of simulation and testing results demonstrated that the proposed testing framework can implement RTHS testing successfully. Using this method, the maximum degrees of freedom for numerical substructure can reach to 27,000, which significantly enhance the testing capacity of RTHS testing for large and complex engineering structures.

Download Full-text

Using the lattice Boltzman method and GPU computing to model flow in organ pipes

The Journal of the Acoustical Society of America ◽

10.1121/10.0007740 ◽

2021 ◽

Vol 150 (4) ◽

pp. A94-A94

Author(s):

Connor N. Kaplan ◽

Jack D. Gabriel ◽

Adrien David-Sivelle ◽

Whitney L. Coyle

Keyword(s):

Gpu Computing ◽

Model Flow ◽

Organ Pipes

Download Full-text

A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge

Sensors ◽

10.3390/s21175916 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5916

Author(s):

Diego Romano ◽

Marco Lapegna

Keyword(s):

Power Efficiency ◽

Gpu Computing ◽

Point Of View ◽

Batch Mode ◽

Image Coregistration ◽

Parallel Image ◽

Correlation Problem ◽

Computationally Intensive ◽

And Performance ◽

Promising Perspective

Image Coregistration for InSAR processing is a time-consuming procedure that is usually processed in batch mode. With the availability of low-energy GPU accelerators, processing at the edge is now a promising perspective. Starting from the individuation of the most computationally intensive kernels from existing algorithms, we decomposed the cross-correlation problem from a multilevel point of view, intending to design and implement an efficient GPU-parallel algorithm for multiple settings, including the edge computing one. We analyzed the accuracy and performance of the proposed algorithm—also considering power efficiency—and its applicability to the identified settings. Results show that a significant speedup of InSAR processing is possible by exploiting GPU computing in different scenarios with no loss of accuracy, also enabling onboard processing using SoC hardware.

Download Full-text

Expanding IceCube GPU computing into the Clouds

10.1109/escience51609.2021.00034 ◽

2021 ◽

Author(s):

Igor Sfiligoi ◽

Shava Smallen ◽

Frank Wurthwein ◽

Nicole Wolter ◽

David Schultz ◽

...

Keyword(s):

Gpu Computing

Download Full-text

gpu computing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning

Surfing Chaotic Perturbations in Interplanetary Multi-Flyby Trajectories: Augmented Picard-Chebyshev Integration for Parallel and GPU Computing Architectures

Construction of an optoacoustic image of biological tissues based on an algorithm for a graphics processor

Rapid quantification of tissue perfusion properties with a two-stage look-up table: a simulation study

Реализация и производительность алгоритмов волновой томографии на вычислительных платформах SIMD CPU и GPU

Relaxed Replication for Energy Efficient and Resilient GPU Computing

Implementation of Real Time Hybrid Simulation Based On GPU Computing

Using the lattice Boltzman method and GPU computing to model flow in organ pipes

A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge

Expanding IceCube GPU computing into the Clouds

Export Citation Format

gpu computingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning

Surfing Chaotic Perturbations in Interplanetary Multi-Flyby Trajectories: Augmented Picard-Chebyshev Integration for Parallel and GPU Computing Architectures

Construction of an optoacoustic image of biological tissues based on an algorithm for a graphics processor

Rapid quantification of tissue perfusion properties with a two-stage look-up table: a simulation study

Реализация и производительность алгоритмов волновой томографии на вычислительных платформах SIMD CPU и GPU

Relaxed Replication for Energy Efficient and Resilient GPU Computing

Implementation of Real Time Hybrid Simulation Based On GPU Computing

Using the lattice Boltzman method and GPU computing to model flow in organ pipes

A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge

Expanding IceCube GPU computing into the Clouds

gpu computing
Recently Published Documents