graphics processors Latest Research Papers

NEW OPPORTUNITIES FOR HIGH-PERFORMANCE SIMULATIONS OF NANOSYSTEM USING METROPOLIS SOFTWARE

Physical and Chemical Aspects of the Study of Clusters Nanostructures and Nanomaterials ◽

10.26456/pcascnn/2021.13.624 ◽

2021 ◽

pp. 624-638

Author(s):

Денис Николаевич Соколов ◽

Николай Юрьевич Сдобняков ◽

Ксения Геннадьевна Савина ◽

Андрей Юрьевич Колосов ◽

Владимир Сергеевич Мясниченко

Keyword(s):

Monte Carlo ◽

Monte Carlo Method ◽

Software Package ◽

High Performance ◽

Tight Binding ◽

Binding Potential ◽

Graphics Processors ◽

Many Body ◽

The Monte Carlo Method ◽

Software Implementations

Описана архитектура и программное обеспечение Metropolis для проведения компьютерного моделирования методом Монте-Карло, а также его модификации. В качестве потенциала используется потенциал сильной связи, однако это не исключает возможности использования других модификаций апробированных многочастичных потенциалов. В сравнении с предыдущими программными реализациями метода Монте-Карло данная модификация увеличила скорость расчетов в 700 раз для выбранного размера наночастицы. Представлены данные по сходимости результатов моделирования методом Монте-Карло на примере температуры плавления. Разработанный программный комплекс постоянно апробируется для расчетов различных моно- и многокомпонентных наночастиц и наносистем. Полученные результаты показывают достаточно хорошее согласие с другими численными методами, в первую очередь с молекулярной динамикой, и реальным экспериментом. Дальнейшее развитие программного комплекса и улучшение показателей эффективности его работы планируется с использованием параллелизации вычислений и использование технологии вычислений на графических процессорах CUDA. The architecture and software Metropolis for computer simulation by the Monte Carlo method, as well as its modifications, are described. The tight-binding potential that does not exclude the possibility of using other modifications of many-body potentials. In comparison with previous software implementations of the Monte Carlo method, this modification has increased the rate of calculations by 700 times for a selected nanoparticle size. The data on the convergence of the results of modeling by the Monte Carlo method are presented on the example of the melting point. The developed software package is constantly tested for calculations of various mono- and multicomponent nanoparticles and nanosystems. The results obtained show fairly good agreement with other numerical methods, primarily molecular dynamics, and real experiment. Further development of the software package and its performance indicators are planned to be improved using parallelization of computations and the use of computing technology on graphics processors CUDA.

Seeing the Trees from the Forest: Using Modern Methods to Identify Individual Objects in a Cluttered Environment for Robots

10.26686/wgtn.17141933.v1 ◽

2021 ◽

Author(s):

◽

Josh Prow

Keyword(s):

Computer Vision ◽

Object Detection ◽

Low Cost ◽

Region Of Interest ◽

High Growth ◽

Working Environment ◽

Single Shot ◽

Graphics Processors ◽

High Definition ◽

Depth Cameras

<p>Robotics and computer vision are areas of high growth across both industry and personal usage environments. Robots in industrial situations have been used to work in environments that are hazardous for humans or to perform basic tasks that require fine detail beyond that which human operators can reliably perform. These robotic solutions require a variety of sensors and cameras to navigate and identify objects within their working environment, as well as software and intelligent detection systems. These solutions generally require high definition depth cameras, laser range finders and computer vision algorithms, which are both expensive and require expensive graphics processors to run practically. This thesis explores the option of a low-cost computer vision enabled robotic solution, which can operate within a forestry environment. Starting with the accuracy of camera technologies, testing two of the main cameras available for robotic vision, and demonstrating the benefits of the RealSense D435 by Intel over the Kinect for X-Box One. Followed by testing common object detection and recognition algorithms on different devices; considering the advantages and weaknesses of the determined models for the intended purpose of forestry. These tests support other research on finding that the MobileNet Single Shot Detector has the fastest recognition speeds with accurate precision, however, it struggles where multiple objects were present, or the background was complex. In comparison, the Mask R-CNN had high accuracy and was able to identify objects consistently even with large numbers overlaid within a single frame. A combined method based on the Faster R-CNN architecture with a MobileNet backbone and masking layers is proposed, developed and tested based on these findings. This method utilized the feature extraction and object detection abilities of the faster MobileNet in place of the traditionally ResNet based feature proposal networks, while still capitalizing on the benefits of the region of interest (ROI) align and masking from the Mask R-CNN architecture. The results from this model did not meet the criteria required to recommend the model as an operational solution for the forestry environment. However, they do show that the model has higher performance and average precision than other models with similar frame rates on the non-CUDA enabled testing device. Demonstrating the technology and methodology has the potential to be the basis for a future solution to the problem of balancing accuracy and performance on a low performance or non GPU-enabled robotic unit.</p>

Seeing the Trees from the Forest: Using Modern Methods to Identify Individual Objects in a Cluttered Environment for Robots

10.26686/wgtn.17141933 ◽

2021 ◽

Author(s):

◽

Josh Prow

Keyword(s):

Computer Vision ◽

Object Detection ◽

Low Cost ◽

Region Of Interest ◽

High Growth ◽

Working Environment ◽

Single Shot ◽

Graphics Processors ◽

High Definition ◽

Depth Cameras

<p>Robotics and computer vision are areas of high growth across both industry and personal usage environments. Robots in industrial situations have been used to work in environments that are hazardous for humans or to perform basic tasks that require fine detail beyond that which human operators can reliably perform. These robotic solutions require a variety of sensors and cameras to navigate and identify objects within their working environment, as well as software and intelligent detection systems. These solutions generally require high definition depth cameras, laser range finders and computer vision algorithms, which are both expensive and require expensive graphics processors to run practically. This thesis explores the option of a low-cost computer vision enabled robotic solution, which can operate within a forestry environment. Starting with the accuracy of camera technologies, testing two of the main cameras available for robotic vision, and demonstrating the benefits of the RealSense D435 by Intel over the Kinect for X-Box One. Followed by testing common object detection and recognition algorithms on different devices; considering the advantages and weaknesses of the determined models for the intended purpose of forestry. These tests support other research on finding that the MobileNet Single Shot Detector has the fastest recognition speeds with accurate precision, however, it struggles where multiple objects were present, or the background was complex. In comparison, the Mask R-CNN had high accuracy and was able to identify objects consistently even with large numbers overlaid within a single frame. A combined method based on the Faster R-CNN architecture with a MobileNet backbone and masking layers is proposed, developed and tested based on these findings. This method utilized the feature extraction and object detection abilities of the faster MobileNet in place of the traditionally ResNet based feature proposal networks, while still capitalizing on the benefits of the region of interest (ROI) align and masking from the Mask R-CNN architecture. The results from this model did not meet the criteria required to recommend the model as an operational solution for the forestry environment. However, they do show that the model has higher performance and average precision than other models with similar frame rates on the non-CUDA enabled testing device. Demonstrating the technology and methodology has the potential to be the basis for a future solution to the problem of balancing accuracy and performance on a low performance or non GPU-enabled robotic unit.</p>

Реализация и производительность алгоритмов волновой томографии на вычислительных платформах SIMD CPU и GPU

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v22r421 ◽

2021 ◽

pp. 322-332

Author(s):

A.V. Goncharsky ◽

S.Y. Romanov ◽

S.Y. Seryozhnikov

Keyword(s):

Gpu Computing ◽

Fdtd Method ◽

Graphics Processors ◽

Practical Applications ◽

Tomographic Image Reconstruction ◽

Coefficient Inverse Problems ◽

Gradient Based ◽

Wave Tomography ◽

Computing Platforms ◽

Difference Time

This paper is concerned with implementation of wave tomography algorithms on modern SIMD CPU and GPU computing platforms. The field of wave tomography, which is currently under development, requires powerful computing resources. Main applications of wave tomography are medical imaging, nondestructive testing, seismic studies. Practical applications depend on computing hardware. Tomographic image reconstruction via wave tomography technique involves solving coefficient inverse problems for the wave equation. Such problems can be solved using iterative gradient-based methods, which rely on repeated numerical simulation of wave propagation process. In this study, finite-difference time-domain (FDTD) method is employed for wave simulation. This paper discusses software implementation of the algorithms and compares the performance of various computing devices: multi-core Intel and ARM-based CPUs, NVidia graphics processors. В данной статье рассматривается реализация алгоритмов волновой томографии на современных вычислительных платформах SIMD CPU и GPU. Область волновой томографии, которая в настоящее время находится в стадии разработки, требует мощных вычислительных ресурсов. Основные области применения волновой томографии - это медицинская визуализация, неразрушающий контроль, сейсмические исследования. Практические приложения зависят от вычислительного оборудования. Восстановление томографического изображения методом волновой томографии включает решение коэффициентов обратной задачи для волнового уравнения. Такие проблемы могут быть решены с помощью итерационных градиентных методов, основанных на многократном численном моделировании процесса распространения волн. В этом исследовании для моделирования волн используется метод конечных разностей во временной области (FDTD). В статье обсуждается программная реализация алгоритмов и сравнивается производительность различных вычислительных устройств: многоядерных процессоров Intel и ARM, графических процессоров NVidia.

IMAGE CONVERTER BASED ON BLOCK COMPRESSION ALGORITHMS OF DXT1, DXT3 AND DXT5 TEXTURES

Cybersecurity Education Science Technique ◽

10.28925/2663-4023.2021.12.6984 ◽

2021 ◽

Vol 12 (4) ◽

pp. 69-84

Author(s):

Konstantin Nesterenko ◽

Bohdan Zhurakovskyi

Keyword(s):

Three Dimensional ◽

Computer Game ◽

Graphics Processors ◽

Special Effects ◽

Compression Algorithms ◽

Advantages And Disadvantages ◽

Develop Software ◽

Manual Testing ◽

Active Implementation ◽

Graphics Software

This article analyzes the existing applications that implement block texture compression algorithms. Based on it, the most optimal variant of technical implementation is introduced. A set of technologies for the implementation of the prototype is selected and substantiated and its architecture is developed on the basis of the principles that ensure the maximum extensibility and purity of the code. With the development of technology and the integration of computerized systems into all possible areas of human activity, more and more software with three-dimensional graphics is being used. Such programs have long since ceased to be used only in the entertainment field for tasks such as computer game development or special effects for cinema. Now with their help doctors can plan the most complex operations, architects check the developed plans of constructions and engineers to model prototypes without use of any materials. On the one hand, such a rapid increase can be explained by the increase in the power of components for personal computers. For example, modern graphics processors, which play a key role in the operation of graphics software, have become much faster in recent decades and have increased their memory hundreds of times. However, no matter how many resources the system has, the question of their efficient use still remains. It is to solve this problem that block texture compression algorithms have been created. In fact, they made it possible to create effective software when computer resources were still quite limited. And with increasing resources allowed to develop software with an incredible level of detail of the models, which led to its active implementation in such demanding areas as medicine, construction and more. The end result of this work is a developed application that takes into account the modern needs of the user. During the development, the most modern technologies were used for the highest speed and relevance of the application. The main advantages and disadvantages of existing solutions were also taken into account during the development. The capabilities of the system were tested using manual testing on a local machine.

Application of neural networks in industrial production

10.33920/pro-2-2106-01 ◽

2021 ◽

pp. 10-17

Author(s):

S. S. Yudachev ◽

N. A. Gordienko ◽

F. M. Bosy

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Effective Means ◽

Back Propagation ◽

Back Propagation Algorithm ◽

Graphics Processors ◽

Training Scheme ◽

Network Training ◽

The Neural Network

The article describes an algorithm for the synthesis of neural networks for controlling the gyrostabilizer. The neural network acts as an observer of the state vector. The role of such an observer is to provide feedback to the gyrostabilizer, which is illustrated in the article. Gyrostabilizer is a gyroscopic device designed to stabilize individual objects or devices, as well as to determine the angular deviations of objects. Gyrostabilizer systems will be more widely used, as they provide an effective means of motion control with a number of significant advantages for various designs. The article deals in detail with the issue of specific stage features of classical algorithms: selecting the network architecture, training the neural network, and verifying the results of feedback control. In recent years, neural networks have become an increasingly powerful tool in scientific computing. The universal approximation theorem states that a neural network can be constructed to approximate any given continuous function with the required accuracy. The back propagation algorithm also allows effectively optimizing the parameters when training a neural network. Due to the use of graphics processors, it is possible to perform efficient calculations for scientific and engineering tasks. The article presents the optimal configuration of the neural network, such as the depth of memory, the number of layers and neurons in these layers, as well as the functions of the activation layer. In addition, it provides data on dynamic systems to improve neural network training. An optimal training scheme is also provided.

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Sensors ◽

10.3390/s21082637 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2637

Author(s):

Ignacio Pérez ◽

Miguel Figueroa

Keyword(s):

Image Classification ◽

High Speed ◽

Hardware Acceleration ◽

Graphics Processors ◽

Embedded Processor ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computationally Intensive ◽

On Chip

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.

Elastic Downsampling: An Adaptive Downsampling Technique to Preserve Image Quality

Electronics ◽

10.3390/electronics10040400 ◽

2021 ◽

Vol 10 (4) ◽

pp. 400

Author(s):

Jose J. García Aranda ◽

Manuel Alarcón Granero ◽

Francisco Jose Juan Quintanilla ◽

Gabriel Caffarena ◽

Rodrigo García-Carmona

Keyword(s):

Image Quality ◽

Low Cost ◽

Sampling Rate ◽

Structural Similarity ◽

Raspberry Pi ◽

Gradual Transition ◽

The Novel ◽

Graphics Processors ◽

Image Region ◽

Blocking Effects

This paper presents a new adaptive downsampling technique called elastic downsampling, which enables high compression rates while preserving the image quality. Adaptive downsampling techniques are based on the idea that image tiles can use different sampling rates depending on the amount of information conveyed by each block. However, current approaches suffer from blocking effects and artifacts that hinder the user experience. To bridge this gap, elastic downsampling relies on a Perceptual Relevance analysis that assigns sampling rates to the corners of blocks. The novel metric used for this analysis is based on the luminance fluctuations of an image region. This allows a gradual transition of the sampling rate within tiles, both horizontally and vertically. As a result, the block artifacts are removed and fine details are preserved. Experimental results (using the Kodak and USC Miscelanea image datasets) show a PSNR improvement of up to 15 dB and a superior SSIM (Structural Similarity) when compared with other techniques. More importantly, the algorithms involved are computationally cheap, so it is feasible to implement them in low-cost devices. The proposed technique has been successfully implemented using graphics processors (GPU) and low-power embedded systems (Raspberry Pi) as target platforms.

Simulation of Gas Dynamics of Hypersonic Aircrafts with the Use of Model of High-Temperature Air and Graphics Processor Units

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v22r103 ◽

2021 ◽

pp. 29-46

Author(s):

К.Н. Волков ◽

Ю.В. Добров ◽

А.Г. Карпенко ◽

С.И. Мальковский ◽

А.А. Сорокин

Keyword(s):

High Temperature ◽

Graphics Processing Units ◽

High Performance ◽

Computational Time ◽

Gas Flows ◽

Hybrid Architecture ◽

Graphics Processors ◽

Hypersonic Aircraft ◽

Volume Method ◽

Graphics Processing

Проводится численное моделирование обтекания гиперзвукового летательного аппарата с использованием модели высокотемпературного воздуха и гибридной архитектуры на основе высокопроизводительных графических процессорных устройств. Расчеты проводятся на основе уравнений Эйлера, для дискретизации которых применяется метод конечных объемов на неструктурированных сетках. Приводятся результаты исследования эффективности расчета гиперзвуковых течений газа на графических процессорах. Обсуждается время счета, достигнутое при использовании моделей совершенного и реального газа. Numerical simulation of the flow around a hypersonic aircraft is carried out using a high-temperature air model and a hybrid architecture based on high-performance graphics processing units. The calculations are performed with the Euler equations discretized by the finite volume method on unstructured meshes. The scalability of the developed implementations of the model is studied and the results of the study of the efficiency of calculating hypersonic gas flows on graphics processors are analyzed. The computational time spent with the perfect and real gas models is discussed.

Distributions of Two Atoms Collisions over the Surface of the Condensed Phase

EPJ Web of Conferences ◽

10.1051/epjconf/202124801022 ◽

2021 ◽

Vol 248 ◽

pp. 01022

Author(s):

Sergey Zheltov ◽

Leonid Pletnev

Keyword(s):

Potential Barrier ◽

Condensed Phase ◽

Evaporation Rate ◽

Computer Experiments ◽

Knudsen Layer ◽

Graphics Processors ◽

Rigid Spheres ◽

Density Distributions ◽

The Monte Carlo Method ◽

Cuda Technology

The processes of heat and mass transfer are closely related to the evaporation of a substance from the surface of the condensed phase. The interaction of outgoing molecules from the surface of the condensed phase with condensed phase molecules plays a fundamental role. A simpler case of evaporation is the departure of atoms from the surface of the condensed phase, i.e. the atoms overcome the potential barrier on the surface of the condensed phase. Depending on the evaporation rate, a Knudsen layer appears above the surface of the condensed phase. In this paper, based on the model of rigid spheres, the density distributions of the collision distances and the average values of the collision distances of two atoms emitted simultaneously from the surface of the condensed phase above the surface are analyzed. Distributions of the collision distance depending on the surface temperature, the size of the potential barrier, and the size of the evaporation area are obtained. Computer experiments were performed using the Monte Carlo method. To obtain the results of numerical simulation, a parallel algorithm adapted to calculations on graphics processors with CUDA technology was developed.

graphics processors
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

NEW OPPORTUNITIES FOR HIGH-PERFORMANCE SIMULATIONS OF NANOSYSTEM USING METROPOLIS SOFTWARE

Seeing the Trees from the Forest: Using Modern Methods to Identify Individual Objects in a Cluttered Environment for Robots

Seeing the Trees from the Forest: Using Modern Methods to Identify Individual Objects in a Cluttered Environment for Robots

Реализация и производительность алгоритмов волновой томографии на вычислительных платформах SIMD CPU и GPU

IMAGE CONVERTER BASED ON BLOCK COMPRESSION ALGORITHMS OF DXT1, DXT3 AND DXT5 TEXTURES

Application of neural networks in industrial production

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Elastic Downsampling: An Adaptive Downsampling Technique to Preserve Image Quality

Simulation of Gas Dynamics of Hypersonic Aircrafts with the Use of Model of High-Temperature Air and Graphics Processor Units

Distributions of Two Atoms Collisions over the Surface of the Condensed Phase

Export Citation Format

graphics processorsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

NEW OPPORTUNITIES FOR HIGH-PERFORMANCE SIMULATIONS OF NANOSYSTEM USING METROPOLIS SOFTWARE

Seeing the Trees from the Forest: Using Modern Methods to Identify Individual Objects in a Cluttered Environment for Robots

Seeing the Trees from the Forest: Using Modern Methods to Identify Individual Objects in a Cluttered Environment for Robots

Реализация и производительность алгоритмов волновой томографии на вычислительных платформах SIMD CPU и GPU

IMAGE CONVERTER BASED ON BLOCK COMPRESSION ALGORITHMS OF DXT1, DXT3 AND DXT5 TEXTURES

Application of neural networks in industrial production

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Elastic Downsampling: An Adaptive Downsampling Technique to Preserve Image Quality

Simulation of Gas Dynamics of Hypersonic Aircrafts with the Use of Model of High-Temperature Air and Graphics Processor Units

Distributions of Two Atoms Collisions over the Surface of the Condensed Phase

graphics processors
Recently Published Documents