graphics processing
Recently Published Documents


TOTAL DOCUMENTS

1978
(FIVE YEARS 416)

H-INDEX

52
(FIVE YEARS 8)

2022 ◽  
Vol 8 (1) ◽  
pp. 9
Author(s):  
Bruno Sauvalle ◽  
Arnaud de La Fortelle

The goal of background reconstruction is to recover the background image of a scene from a sequence of frames showing this scene cluttered by various moving objects. This task is fundamental in image analysis, and is generally the first step before more advanced processing, but difficult because there is no formal definition of what should be considered as background or foreground and the results may be severely impacted by various challenges such as illumination changes, intermittent object motions, highly cluttered scenes, etc. We propose in this paper a new iterative algorithm for background reconstruction, where the current estimate of the background is used to guess which image pixels are background pixels and a new background estimation is performed using those pixels only. We then show that the proposed algorithm, which uses stochastic gradient descent for improved regularization, is more accurate than the state of the art on the challenging SBMnet dataset, especially for short videos with low frame rates, and is also fast, reaching an average of 52 fps on this dataset when parameterized for maximal accuracy using acceleration with a graphics processing unit (GPU) and a Python implementation.


Urban Climate ◽  
2022 ◽  
Vol 41 ◽  
pp. 101063
Author(s):  
Mohammad Mortezazadeh ◽  
Liangzhu Leon Wang ◽  
Maher Albettar ◽  
Senwen Yang

Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2021 ◽  
Vol 28 (4) ◽  
pp. 338-355
Author(s):  
Natalia Olegovna Garanina ◽  
Sergei Petrovich Gorlatch

The paper presents a new approach to autotuning data-parallel programs. Autotuning is a search for optimal program settings which maximize its performance. The novelty of the approach lies in the use of the model checking method to find the optimal tuning parameters by the method of counterexamples. In our work, we abstract from specific programs and specific processors by defining their representative abstract patterns. Our method of counterexamples implements the following four steps. At the first step, an execution model of an abstract program on an abstract processor is described in the language of a model checking tool. At the second step, in the language of the model checking tool, we formulate the optimality property that depends on the constructed model. At the third step, we find the optimal values of the tuning parameters by using a counterexample constructed during the verification of the optimality property. In the fourth step, we extract the information about the tuning parameters from the counter-example for the optimal parameters. We apply this approach to autotuning parallel programs written in OpenCL, a popular modern language that extends the C language for programming both standard multi-core processors (CPUs) and massively parallel graphics processing units (GPUs). As a verification tool, we use the SPIN verifier and its model representation language Promela, whose formal semantics is good for modelling the execution of parallel programs on processors with different architectures.


Author(s):  
Pascal R Bähr ◽  
Bruno Lang ◽  
Peer Ueberholz ◽  
Marton Ady ◽  
Roberto Kersevan

Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel runs much faster than on comparable CPUs at the expense of a marginal drop in calculation precision.


2021 ◽  
Author(s):  
Airidas Korolkovas ◽  
Alexander Katsevich ◽  
Michael Frenkel ◽  
William Thompson ◽  
Edward Morton

X-ray computed tomography (CT) can provide 3D images of density, and possibly the atomic number, for large objects like passenger luggage. This information, while generally very useful, is often insufficient to identify threats like explosives and narcotics, which can have a similar average composition as benign everyday materials such as plastics, glass, light metals, etc. A much more specific material signature can be measured with X-ray diffraction (XRD). Unfortunately, XRD signal is very faint compared to the transmitted one, and also challenging to reconstruct for objects larger than a small laboratory sample. In this article we analyze a novel low-cost scanner design which captures CT and XRD signals simultaneously, and uses the least possible collimation to maximize the flux. To simulate a realistic instrument, we derive a formula for the resolution of any diffraction pathway, taking into account the polychromatic spectrum, and the finite size of the source, detector, and each voxel. We then show how to reconstruct XRD patterns from a large phantom with multiple diffracting objects. Our approach includes a reasonable amount of photon counting noise (Poisson statistics), as well as measurement bias, in particular incoherent Compton scattering. The resolution of our reconstruction is sufficient to provide significantly more information than standard CT, thus increasing the accuracy of threat detection. Our theoretical model is implemented in GPU (Graphics Processing Unit) accelerated software which can be used to assess and further optimize scanner designs for specific applications in security, healthcare, and manufacturing quality control.


2021 ◽  
Vol 4 ◽  
pp. 10-15
Author(s):  
Gennadii Malaschonok ◽  
Serhii Sukharskyi

With the development of the Big Data sphere, as well as those fields of study that we can relate to artificial intelligence, the need for fast and efficient computing has become one of the most important tasks nowadays. That is why in the recent decade, graphics processing unit computations have been actively developing to provide an ability for scientists and developers to use thousands of cores GPUs have in order to perform intensive computations. The goal of this research is to implement orthogonal decomposition of a matrix by applying a series of Householder transformations in Java language using JCuda library to conduct a research on its benefits. Several related papers were examined. Malaschonok and Savchenko in their work have introduced an improved version of QR algorithm for this purpose [4] and achieved better results, however Householder algorithm is more promising for GPUs according to another team of researchers – Lahabar and Narayanan [6]. However, they were using Float numbers, while we are using Double, and apart from that we are working on a new BigDecimal type for CUDA. Apart from that, there is still no solution for handling huge matrices where errors in calculations might occur. The algorithm of orthogonal matrix decomposition, which is the first part of SVD algorithm, is researched and implemented in this work. The implementation of matrix bidiagonalization and calculation of orthogonal factors by the Hausholder method in the jCUDA environment on a graphics processor is presented, and the algorithm for the central processor for comparisons is also implemented. Research of the received results where we experimentally measured acceleration of calculations with the use of the graphic processor in comparison with the implementation on the central processor are carried out. We show a speedup up to 53 times compared to CPU implementation on a big matrix size, specifically 2048, and even better results when using more advanced GPUs. At the same time, we still experience bigger errors in calculations while using graphic processing units due to synchronization problems. We compared execution on different platforms (Windows 10 and Arch Linux) and discovered that they are almost the same, taking the computation speed into account. The results have shown that on GPU we can achieve better performance, however there are more implementation difficulties with this approach.


2021 ◽  
Vol 4 ◽  
pp. 16-22
Author(s):  
Mykola Semylitko ◽  
Gennadii Malaschonok

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.


2021 ◽  
Author(s):  
Airidas Korolkovas ◽  
Alexander Katsevich ◽  
Michael Frenkel ◽  
William Thompson ◽  
Edward Morton

X-ray computed tomography (CT) can provide 3D images of density, and possibly the atomic number, for large objects like passenger luggage. This information, while generally very useful, is often insufficient to identify threats like explosives and narcotics, which can have a similar average composition as benign everyday materials such as plastics, glass, light metals, etc. A much more specific material signature can be measured with X-ray diffraction (XRD). Unfortunately, XRD signal is very faint compared to the transmitted one, and also challenging to reconstruct for objects larger than a small laboratory sample. In this article we analyze a novel low-cost scanner design which captures CT and XRD signals simultaneously, and uses the least possible collimation to maximize the flux. To simulate a realistic instrument, we derive a formula for the resolution of any diffraction pathway, taking into account the polychromatic spectrum, and the finite size of the source, detector, and each voxel. We then show how to reconstruct XRD patterns from a large phantom with multiple diffracting objects. Our approach includes a reasonable amount of photon counting noise (Poisson statistics), as well as measurement bias, in particular incoherent Compton scattering. The resolution of our reconstruction is sufficient to provide significantly more information than standard CT, thus increasing the accuracy of threat detection. Our theoretical model is implemented in GPU (Graphics Processing Unit) accelerated software which can be used to assess and further optimize scanner designs for specific applications in security, healthcare, and manufacturing quality control.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8237
Author(s):  
Jan Matuszewski ◽  
Dymitr Pietrow

With the increasing complexity of the electromagnetic environment and continuous development of radar technology we can expect a large number of modern radars using agile waveforms to appear on the battlefield in the near future. Effectively identifying these radar signals in electronic warfare systems only by relying on traditional recognition models poses a serious challenge. In response to the above problem, this paper proposes a recognition method of emitted radar signals with agile waveforms based on the convolutional neural network (CNN). These signals are measured in the electronic recognition receivers and processed into digital data, after which they undergo recognition. The implementation of this system is presented in a simulation environment with the help of a signal generator that has the ability to make changes in signal signatures earlier recognized and written in the emitter database. This article contains a description of the software’s components, learning subsystem and signal generator. The problem of teaching neural networks with the use of the graphics processing units and the way of choosing the learning coefficients are also outlined. The correctness of the CNN operation was tested using a simulation environment that verified the operation’s effectiveness in a noisy environment and in conditions where many radar signals that interfere with each other are present. The effectiveness results of the applied solutions and the possibilities of developing the method of learning and processing algorithms are presented by means of tables and appropriate figures. The experimental results demonstrate that the proposed method can effectively solve the problem of recognizing raw radar signals with agile time waveforms, and achieve correct probability of recognition at the level of 92–99%.


Sign in / Sign up

Export Citation Format

Share Document