cuda technology
Recently Published Documents


TOTAL DOCUMENTS

93
(FIVE YEARS 22)

H-INDEX

6
(FIVE YEARS 0)

Author(s):  
M. B. Kurmanseiit ◽  
◽  
M. S. Tungatarova ◽  
K. A. Alibayeva ◽  
◽  
...  

In-Situ Leaching is a method of extracting minerals by selectively dissolving it with a leaching solution directly in the place of occurrence of the mineral. In practice, during the development of deposits with the In-Situ Leaching method, situations arise when the solution tends to go down below the active thickness of the stratum. This may be due to geological heterogeneity of the rock or gravitational sedimentation of the solution in the rock due to the difference in the densities of the solution and groundwater. As a result of the deposition of the solution along the height, there is a decrease in the recovery of the metal located in the upper part of the geological layers. This article examines the effect of gravity on the flow regime during the filtration of the solution in the rock. The influence of the gravitational effect on the flow of solution in the rock is studied for different ratios of the densities of the solution and groundwater without taking into account the interaction of the solution with the rock. The CUDA technology is used to improve the performance of calculations. The results show that the use of CUDA technology allows to increase the performance of calculations by 40-80 times compared to calculations on a central processing unit (CPU) for different computational grids.


Water ◽  
2021 ◽  
Vol 13 (23) ◽  
pp. 3435
Author(s):  
Boram Kim ◽  
Kwang Seok Yoon ◽  
Hyung-Jun Kim

In this study, a CUDA Fortran-based GPU-accelerated Laplace equation model was developed and applied to several cases. The Laplace equation is one of the equations that can physically analyze the groundwater flows, and is an equation that can provide analytical solutions. Such a numerical model requires a large amount of data to physically regenerate the flow with high accuracy, and requires computational time. These numerical models require a large amount of data to physically reproduce the flow with high accuracy and require computational time. As a way to shorten the computation time by applying CUDA technology, large-scale parallel computations were performed on the GPU, and a program was written to reduce the number of data transfers between the CPU and GPU. A GPU consists of many ALUs specialized in graphic processing, and can perform more concurrent computations than a CPU using multiple ALUs. The computation results of the GPU-accelerated model were compared with the analytical solution of the Laplace equation to verify the accuracy. The computation results of the GPU-accelerated Laplace equation model were in good agreement with the analytical solution. As the number of grids increased, the computational time of the GPU-accelerated model gradually reduced compared to the computational time of the CPU-based Laplace equation model. As a result, the computational time of the GPU-accelerated Laplace equation model was reduced by up to about 50 times.


Doklady BGUIR ◽  
2021 ◽  
Vol 19 (6) ◽  
pp. 92-96
Author(s):  
S. V. Kozlov

The features of the implementation of the algorithm for the synthesis of detail radar images for an aperture synthesis radar using the built-in functions of the Cuda library are presented. The estimation of computational complexity from the standpoint of the organization of parallel computing on Nvidia GPUs is given. The estimation of the real performance of radar synthesis is given, taking into account the volume and features of the placement of primary radar information.


2021 ◽  
Vol 5 (2(61)) ◽  
pp. 21-25
Author(s):  
Yaroslav Sokolovskyy ◽  
Denys Manokhin ◽  
Yaroslav Kaplunsky ◽  
Olha Mokrytska

The object of research is to parallelize the learning process of artificial neural networks to automate the procedure of medical image analysis using the Python programming language, PyTorch framework and Compute Unified Device Architecture (CUDA) technology. The operation of this framework is based on the Define-by-Run model. The analysis of the available cloud technologies for realization of the task and the analysis of algorithms of learning of artificial neural networks is carried out. A modified U-Net architecture from the MedicalTorch library was used. The purpose of its application was the need for a network that can effectively learn with small data sets, as in the field of medicine one of the most problematic places is the availability of large datasets, due to the requirements for data confidentiality of this nature. The resulting information system is able to implement the tasks set before it, contains the most user-friendly interface and all the necessary tools to simplify and automate the process of visualization and analysis of data. The efficiency of neural network learning with the help of the central processor (CPU) and with the help of the graphic processor (GPU) with the use of CUDA technologies is compared. Cloud technology was used in the study. Google Colab and Microsoft Azure were considered among cloud services. Colab was first used to build a prototype. Therefore, the Azure service was used to effectively teach the finished architecture of the artificial neural network. Measurements were performed using cloud technologies in both services. The Adam optimizer was used to learn the model. CPU duration measurements were also measured to assess the acceleration of CUDA technology. An estimate of the acceleration obtained through the use of GPU computing and cloud technologies was implemented. CPU duration measurements were also measured to assess the acceleration of CUDA technology. The model developed during the research showed satisfactory results according to the metrics of Jaccard and Dyce in solving the problem. A key factor in the success of this study was cloud computing services.


Doklady BGUIR ◽  
2021 ◽  
Vol 19 (3) ◽  
pp. 14-21
Author(s):  
S. S. Sherbakov ◽  
M. M. Polestchuk

The evolution of computer technologies, as a hardware and a software parts, allows to attain fast and accurate  solutions  to  many  applied  problems  in  scientific  areas.  Acceleration  of  calculations  is  broadly  used technic that is basically implemented by multithreading and multicore processors. NVidia CUDA technology or simply CUDA opens a way to efficient acceleration of boundary elements method (BEM), that includes many independent stages. The main goal of the paper is implementation and acceleration of indirect boundary element method using three form functions. Calculation of the potentialdistribution inside a closed boundary under the action of the defined boundary condition is considered. In order to accelerate corresponding calculations, they were parallelized at the graphic accelerator using NVidia CUDA technology. The dependences of acceleration of parallel  computations  as  compared  with  sequential  ones  were explored  for  different  numbers  of  boundary elements  and  computational  nodes.  A  significant  acceleration  (up  to  52  times)  calculation  of  the  potential distribution  without  loss  in  accuracy  is  shown.  Acceleration  of up  to  22  times  was  achieved  in  calculation of mutual  influence  matrix  for  boundary  elements.  Using  CUDA  technology  allows  to  attain  significant acceleration without loss in accuracy and convergence. So application of CUDA is a good way to parallelizing BEM.  Application  of  developed  approach  allows  to  solve  problems in  different  areas  of  physics  such as acoustics, hydromechanics, electrodynamics, mechanics of solids and many other areas, efficiently.


2021 ◽  
Vol 45 (3) ◽  
pp. 427-437
Author(s):  
A.S. Shirokanev ◽  
N.A. Andriyanov ◽  
N.Y. Ilyasova

For diabetic retinopathy treatment, laser coagulation is used in modern practice. During the laser surgery process, the parameters of laser exposure are selected manually by a doctor, which requires the doctor to have sufficient experience and knowledge to achieve a therapeutic effect. On the basis of mathematical modeling of the laser coagulation process, it is possible to estimate the crucial parameters without performing an operation. However, the retina has a rather complex structure, and when even low-cost numerical methods are used for modeling, it takes a long time to obtain a result. In this regard, the development of time-efficient algorithms for three-dimensional modeling is an urgent task, since the use of such algorithms will provide a compre-hensive study within a limited time. In this paper, we study the execution time of algorithms that implement various variations in the application of the splitting method and the finite difference method, adapted to the set problem of heat conduction. The study reveals the most efficient algorithm, which is then vectorized and implemented using the CUDA technology. The study was carried out using Intel Core i7-10875H and Nvidia RTX 2080 MAX Q and showed that an analog of the vector algorithm, focused on solving a multidimensional heat conduction problem, provides an acceleration of no more than 1.5 times compared to the sequential version. The developed vector-based algorithm, focused on the application of the sweep method in all directions of the three-dimensional problem, significantly reduces the time spent on copying into the memory of the video card and provides a 40-fold acceleration in comparison with the sequential three-dimensional modeling algorithm. On the basis of the same approach, a parallel algorithm of mathematical modeling was developed, which provided a 20-fold acceleration at full processor load.


Author(s):  
I. Basharov ◽  
D. Yudin

Abstract. The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models providing inference in real time. The authors proposed an approach based on combining the modern SOLOv2 instance segmentation model, a neural network model for embedding generation for each found object, and a modified Hungarian tracking algorithm. The Hungarian algorithm was modified taking into account the geometric constraints on the positions of the found objects on the sequence of images. The investigated solution is a development and improvement of the state-of-the-art PointTrack method. The effectiveness of the proposed approach is demonstrated quantitatively and qualitatively on the popular KITTI MOTS dataset collected using the cameras of a driverless car. The software implementation of the approach was carried out. The acceleration of the procedure for the formation of a two-dimensional point cloud in the found image segment was done using the NVidia CUDA technology. At the same time, the proposed instance segmentation module provides a mean processing time of one image of 68 ms, the embedding and tracking module of 24 ms using the NVidia Tesla V100 GPU. This indicates that the proposed solution is promising for on-board computer vision systems for both unmanned vehicles and various robotic platforms.


Author(s):  
M. Sarsembayev ◽  
B. Urmashev ◽  
O. Mamyrbayev ◽  
M. Turdalyuly ◽  
T. Sarsembayeva

The main idea of the implementation is reducing the time for calculation and thereby implement a multi-user mode for users by placing it on a server with access via a web browser. To model the kinetics of chemical reacting systems were used 4th and 5th grade Runge-Kutta methods and to receive the index of advantages of this elaboration were written programs in C# for sequential computation on a central processor and was used a platform for parallel computation of CUDA on graphic processors. Parallelization of data during calculation on a GPU was performed by the distribution of the reaction to individual strands, when changes of the concentration was calculated over a given time interval of a certain substance. Parallelization is performed over all elementary reactions, with the increasing of the number of reactions in the mechanism, because of this the computation on the GPU has a noticeable gain in time.


Author(s):  
A. V. Nikitina ◽  
A. E. Chistyakov ◽  
A. M. Atayan

The purpose of this work is to create a software package for a distributed solution of the problem of transporting a pollutant in a reservoir with complex bathymetry and the presence of technological structures. An algorithm has been developed for the parallel solution of the problem of transporting a pollutant (pollutant) in a reservoir on a graphics accelerator controlled by the CUDA (Compute Unified Device Architecture) system; a comparative analysis of the operation of algorithms on a CPU (Central Processing Unit) and on a graphics accelerator GPU (Graphics Processing Unit) made it possible to evaluate their performance. The software implementation of the modules included in the complex is described, the main classes and implemented methods are documented. The results of numerical experiments showed that solving of pollutant transport’s problem based on the CUDA technology is ineffective for small grids (up to 100 ´ 100 computational nodes). In the case of large grids (1000 ´ 1000 computational nodes), the use of CUDA technology reduces the computation time by an order of magnitude. An analysis of the experiments carried out with the developed components of software showed that the maximum value of the ratio of the algorithm operating time that implements the set task of transferring matter in a shallow water on a GPU to the operating time of a similar algorithm on the CPU was 24.92 times, which is achieved on a grid of 1000 ´ 1000 computational nodes. Implementation of methods for decomposition of grid regions is proposed for solving computationally laborious problems of diffusion-convection, including the problem of transporting pollutants in a reservoir with complex bathymetry with technological objects that take into account the architecture and parameters of a MSC (Multiprocessor Computing System) located on the basis of the infrastructure facility of the STU (Scientific and Technological University) “Sirius” (Sochi, Russia). Consideration was made for such a property of a computing system as the time it takes to transmit and receive floating point data. An algorithm for the parallel solution of the task under the control of MPI (Message Passing Interface) technology has been developed, and its efficiency has been assessed. The acceleration values of the proposed algorithm are obtained depending on the number of involved computers (processors) and the size of the computational grid. The maximum number of computers used is 24, the maximum size of the computational grid was 10 000 ´ 10 000 computational nodes. The developed algorithm showed low efficiency for small computational grids (up to 100 ´ 100 computational nodes). In the case of large computational grids ( from 1000  1000 computational nodes), the use of MPI reduces the computation time by several times.


2021 ◽  
Author(s):  
Anatoly Vershinin ◽  
Vladimir Levin ◽  
Yury Podladchikov

<p>The presentation describes an approach to solving problems of modeling the development of zones of localization of plastic deformations within the framework of a poroelastoplastic model generalizing Biot's model. A distinctive feature of this model is a two-way coupling between mechanical processes occurring in a porous elastoplastic matrix and a saturating viscous fluid.</p><p>  For the numerical solution of the problem, a variational formulation based on the Galerkin method and the isoparametric  spectral element method (SEM) is used to discretize the geometric model and PDEs on curvilinear unstructured SEM meshes. SEM orders up to the 15th were used for calculations.</p><p>  The software implementation of the developed algorithm based on SEM is performed using CUDA. A spectral element mesh is naturally mapped to a CUDA grid of SMs, and accordingly, each spectral element is mapped to a streaming block, within which individual nodes are processed by the corresponding threads within the block.</p><p><span>The research for this article is performed partially in Schmidt Institute of Physics of the Earth of the Russian Academy of Sciences and supported by the Russian Science Foundation under grant № 19-77-10062.</span></p>


Sign in / Sign up

Export Citation Format

Share Document