Fast Cutter Location Surface Computation Using Ray Tracing Cores

2021 ◽  
Author(s):  
Daiki Ishii ◽  
Masatomo Inui ◽  
Nobuyuki Umezu

Abstract By using the cutter location (CL) surface, fast and stable computation of the cutter path for machining complicated molds and dies can be realized. State-of-the-art graphics processing units (GPUs) are equipped with special hardware named ray tracing (RT) cores dedicated to image processing (called ray tracing) for 3D computer graphics. Using RT cores, it is possible to quickly compute the intersection points between a set of straight lines and polygons. In this paper, we propose a novel CL surface computation method using the RT core. The RT core was originally designed to accelerate 3D computer graphics processing. For the development of software using RT cores, it is necessary to use the OptiX application programming interface (API) library for computer graphics. We demonstrate how to use the OptiX API in the development of software for CL surface computations. Computational experiments were carried out, and it was confirmed that it is possible to obtain the CL surface based on a very high-resolution Z-map several times faster than the depth buffer-based method, which has been considered to be the fastest to date.

Author(s):  
Jianhua Li ◽  
Jingyuan Chen ◽  
Yan Wang ◽  
Jianhua Huang

The parallelization of silicon anisotropic etching simulation with the cellular automata (CA) model on graphics processing units (GPUs) is challenging, because the numbers of computational tasks in etching simulation dynamically change and the existing parallel CA mechanisms do not fit in GPU computation well. In this paper, an improved CA model, called clustered cell model, is proposed for GPU-based etching simulation. The model consists of clustered cells, each of which manages a scalable number of atoms. In this model, only the etching and update of states for the atoms on the etching surface and their unexposed neighbors are performed at each CA time step, whereas the clustered cells are reclassified in a longer time step. With this model, a crystal cell parallelization method is given, where clustered cells are allocated to threads on GPUs in the simulation. With the optimizations from the spatial and temporal aspects as well as a proper granularity, this method provides a faster process simulation. The proposed simulation method is implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the method.


Author(s):  
Pascal R Bähr ◽  
Bruno Lang ◽  
Peer Ueberholz ◽  
Marton Ady ◽  
Roberto Kersevan

Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel runs much faster than on comparable CPUs at the expense of a marginal drop in calculation precision.


2014 ◽  
Vol 4 (3) ◽  
Author(s):  
Branislav Sobota ◽  
Štefan Korečko ◽  
Csaba Szabó ◽  
František Hrozek

AbstractRay tracing is one of computer graphics methods for achieving the most realistic outputs. Its main disadvantage is high computation demands. Removal of this disadvantage is possible using parallelization due to the fact that the ray tracing method is inherently parallel. Solution presented in this article uses GPGPU (general-purpose computing on graphics processing units) technology and a predictive evaluation for the acceleration of ray tracing method. The CUDA C was selected as a GPGPU language and it was used for a conversion of a raytracer core. The main reason for choosing this language was usage of the Tesla C1060 graphics card. The predictive evaluation of a scene was based on the fact that total computation time increases proportionally with resolution. This evaluation allows selection of the optimal scene division for the parallel ray tracing. In tests, proposed GPGPU solution reached accelerations up to 28.3× comparing to CPU.


Open Physics ◽  
2019 ◽  
Vol 17 (1) ◽  
pp. 527-544 ◽  
Author(s):  
Patryk Walewski ◽  
Tomasz Gałaj ◽  
Dominik Szajerman

Abstract Nowadays, rasterization is the most common method used to achieve real-time semi-photorealistic effects in games or interactive applications. Some of those effects are not easily achievable, thus require more complicated methods and are difficult to obtain. The appearance of the presented worlds depends to a large extent on the approximation to the physical basis of light behaviour in them. The best effects in this regard are global illumination algorithms. Each of them including ray tracing give the most plausible effects, but at cost of higher computational complexity. Today’s hardware allows usage of ray tracing methods in-real time on Graphics Processing Units (GPU) thanks to its parallel nature. However, using ray tracing as a single rendering method may still result in poor performance, especially when used to create many image effects in complex environments. In this paper we present a hybrid approach for real-time rendering using both rasterization and ray tracing using heuristic, which determines whether to render secondary effects such as shadows, reflections and refractions for individual objects considering their relevancy and cost of rendering those effects for these objects in particular case.


Author(s):  
Jianhua Li ◽  
Yan Wang ◽  
Jingyuan Chen ◽  
Li Yan

Silicon anisotropic etching simulation, based on geometric model or cellular automata (CA) model, is highly time-consuming. In this paper, we propose two parallelization methods for the simulation of the silicon anisotropic etching process with CA models on graphics processing units (GPUs). One is the direct parallelization of the serial CA algorithm, and the other is to use a spatial parallelization strategy where each crystal unit cell is allocated to a thread in GPU. The proposed simulation methods are implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the methods.


2021 ◽  
Vol 4 ◽  
pp. 16-22
Author(s):  
Mykola Semylitko ◽  
Gennadii Malaschonok

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.


2020 ◽  
pp. short56-1-short56-8
Author(s):  
Vadim Bulavintsev ◽  
Dmitry Zhdanov

With every new generation of graphics processing units (GPUs), offloading ray-tracing algorithms to GPUs becomes more feasible. Software-hardware solutions for ray-tracing focus on implementing its basic components, such as building and traversing bounding volume hierarchies (BVH). However, global illumination algorithms, such as photon mapping method, depend on another kind of acceleration structure, namely k-d trees. In this work, we adapt state-ofthe-art GPU-based BVH-building algorithm of treelet restructuring to k-d trees. By evaluating the performance of the resulting k-d tree, we show that treelet optimisation heuristic suitable for BVHs of triangles is inadequate for k-d trees of points.


2014 ◽  
Vol 39 (4) ◽  
pp. 233-248 ◽  
Author(s):  
Milosz Ciznicki ◽  
Krzysztof Kurowski ◽  
Jan Węglarz

Abstract Heterogeneous many-core computing resources are increasingly popular among users due to their improved performance over homogeneous systems. Many developers have realized that heterogeneous systems, e.g. a combination of a shared memory multi-core CPU machine with massively parallel Graphics Processing Units (GPUs), can provide significant performance opportunities to a wide range of applications. However, the best overall performance can only be achieved if application tasks are efficiently assigned to different types of processor units in time taking into account their specific resource requirements. Additionally, one should note that available heterogeneous resources have been designed as general purpose units, however, with many built-in features accelerating specific application operations. In other words, the same algorithm or application functionality can be implemented as a different task for CPU or GPU. Nevertheless, from the perspective of various evaluation criteria, e.g. the total execution time or energy consumption, we may observe completely different results. Therefore, as tasks can be scheduled and managed in many alternative ways on both many-core CPUs or GPUs and consequently have a huge impact on the overall computing resources performance, there are needs for new and improved resource management techniques. In this paper we discuss results achieved during experimental performance studies of selected task scheduling methods in heterogeneous computing systems. Additionally, we present a new architecture for resource allocation and task scheduling library which provides a generic application programming interface at the operating system level for improving scheduling polices taking into account a diversity of tasks and heterogeneous computing resources characteristics.


Sign in / Sign up

Export Citation Format

Share Document