graphics processing unit Latest Research Papers

Graphics processing unit-accelerated Monte Carlo simulation of polarized light in complex three-dimensional media

10.1101/2022.01.13.476270 ◽

2022 ◽

Author(s):

Shijie Yan ◽

Steven L Jacques ◽

Jessica C. Ramella-Roman ◽

Qianqian Fang

Keyword(s):

Monte Carlo ◽

Graphics Processing Unit ◽

Three Dimensional ◽

Polarized Light ◽

Biological Tissues ◽

Dramatic Improvement ◽

Processing Unit ◽

Photon Propagation ◽

Spatially Resolved ◽

Massively Parallel Computing

Significance: Monte Carlo (MC) methods have been applied for studying interactions between polarized light and biological tissues, but most existing MC codes supporting polarization modeling can only simulate homogeneous or multi-layered domains, resulting in approximations when handling realistic tissue structures. Aim: Over the past decade, the speed of MC simulations has seen dramatic improvement with massively-parallel computing techniques. Developing hardware-accelerated MC simulation algorithms that can accurately model polarized light inside 3-D heterogeneous tissues can greatly expand the utility of polarization in biophotonics applications. Approach: Here we report a highly efficient polarized MC algorithm capable of modeling arbitrarily complex media defined over a voxelated domain. Each voxel of the domain can be associated with spherical scatters of various radii and densities. The Stokes vector of each simulated photon packet is updated through photon propagation, creating spatially resolved polarization measurements over the detectors or domain surface. Results: We have implemented this algorithm in our widely disseminated MC simulator, Monte Carlo eXtreme (MCX). It is validated by comparing with a reference CPU-based simulator in both homogeneous and layered domains, showing excellent agreement and a 931-fold speedup. Conclusion: The polarization-enabled MCX (pMCX) offers biophotonics community an efficient tool to explore polarized light in bio-tissues, and is freely available at http://mcx.space/.

Fast and Accurate Background Reconstruction Using Background Bootstrapping

Journal of Imaging ◽

10.3390/jimaging8010009 ◽

2022 ◽

Vol 8 (1) ◽

pp. 9

Author(s):

Bruno Sauvalle ◽

Arnaud de La Fortelle

Keyword(s):

Moving Objects ◽

Graphics Processing Unit ◽

Stochastic Gradient Descent ◽

Processing Unit ◽

Current Estimate ◽

Background Estimation ◽

Background Reconstruction ◽

Image Pixels ◽

Definition Of ◽

Graphics Processing

The goal of background reconstruction is to recover the background image of a scene from a sequence of frames showing this scene cluttered by various moving objects. This task is fundamental in image analysis, and is generally the first step before more advanced processing, but difficult because there is no formal definition of what should be considered as background or foreground and the results may be severely impacted by various challenges such as illumination changes, intermittent object motions, highly cluttered scenes, etc. We propose in this paper a new iterative algorithm for background reconstruction, where the current estimate of the background is used to guess which image pixels are background pixels and a new background estimation is performed using those pixels only. We then show that the proposed algorithm, which uses stochastic gradient descent for improved regularization, is more accurate than the state of the art on the challenging SBMnet dataset, especially for short videos with low frame rates, and is also fast, reaching an average of 52 fps on this dataset when parameterized for maximal accuracy using acceleration with a graphics processing unit (GPU) and a Python implementation.

Genus-Physiognomy-Ecosystem Map with 88 Legends Produced at 10m-resolution First-time in a Country Scale through Machine Learning of Multi-temporal Satellite Images

10.20944/preprints202201.0123.v1 ◽

2022 ◽

Author(s):

Ram C. Sharma ◽

Keitarou Hara

Keyword(s):

Machine Learning ◽

Satellite Images ◽

Graphics Processing Unit ◽

Ground Truth ◽

Training Data ◽

Gradient Boosting ◽

Processing Unit ◽

Spectral Bands ◽

Multi Temporal ◽

First Time

This research introduces Genus-Physiognomy-Ecosystem (GPE) mapping at a prefecture level through machine learning of multi-spectral and multi-temporal satellite images at 10m spatial resolution, and later integration of prefecture wise maps into country scale for dealing with 88 GPE types to be classified from a large size of training data involved in the research effectively. This research was made possible by harnessing entire archives of Level-2A product, Bottom of Atmosphere reflectance images collected by MultiSpectral Instruments onboard a constellation of two polar-orbiting Sentinel-2 mission satellites. The satellite images were pre-processed for cloud masking and monthly median composite images consisting of 10 multi-spectral bands and 7 spectral indexes were generated. The ground truth labels were extracted from extant vegetation survey maps by implementing systematic stratified sampling approach and noisy labels were dropped out for preparing a reliable ground truth database. Graphics Processing Unit (GPU) implementation of Gradient Boosting Decision Trees (GBDT) classifier was employed for classification of 88 GPE types from 204 satellite features. The classification accuracy computed with 25% test data varied from 65-81% in terms of F1-score across 48 prefectural regions. This research produced seamless maps of 88 GPE types first time at a country scale with an average 72% F1-score.

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Fast X-Ray Diffraction (XRD) Tomography for Enhanced Identification of Materials

10.36227/techrxiv.17125448.v1 ◽

2021 ◽

Author(s):

Airidas Korolkovas ◽

Alexander Katsevich ◽

Michael Frenkel ◽

William Thompson ◽

Edward Morton

Keyword(s):

Graphics Processing Unit ◽

Low Cost ◽

Photon Counting ◽

Finite Size ◽

Processing Unit ◽

X Ray Diffraction ◽

X Ray ◽

Specific Material ◽

Xrd Patterns ◽

Graphics Processing

X-ray computed tomography (CT) can provide 3D images of density, and possibly the atomic number, for large objects like passenger luggage. This information, while generally very useful, is often insufficient to identify threats like explosives and narcotics, which can have a similar average composition as benign everyday materials such as plastics, glass, light metals, etc. A much more specific material signature can be measured with X-ray diffraction (XRD). Unfortunately, XRD signal is very faint compared to the transmitted one, and also challenging to reconstruct for objects larger than a small laboratory sample. In this article we analyze a novel low-cost scanner design which captures CT and XRD signals simultaneously, and uses the least possible collimation to maximize the flux. To simulate a realistic instrument, we derive a formula for the resolution of any diffraction pathway, taking into account the polychromatic spectrum, and the finite size of the source, detector, and each voxel. We then show how to reconstruct XRD patterns from a large phantom with multiple diffracting objects. Our approach includes a reasonable amount of photon counting noise (Poisson statistics), as well as measurement bias, in particular incoherent Compton scattering. The resolution of our reconstruction is sufficient to provide significantly more information than standard CT, thus increasing the accuracy of threat detection. Our theoretical model is implemented in GPU (Graphics Processing Unit) accelerated software which can be used to assess and further optimize scanner designs for specific applications in security, healthcare, and manufacturing quality control.

А Gpu-based Orthogonal Matrix Factorization Algorithm that Produces a Two-Diagonal Shape

NaUKMA Research Papers Computer Science ◽

10.18523/2617-3808.2021.4.10-15 ◽

2021 ◽

Vol 4 ◽

pp. 10-15

Author(s):

Gennadii Malaschonok ◽

Serhii Sukharskyi

Keyword(s):

Graphics Processing Unit ◽

Matrix Decomposition ◽

Recent Decade ◽

Orthogonal Matrix ◽

Processing Unit ◽

Central Processor ◽

Graphic Processing Units ◽

Qr Algorithm ◽

Java Language ◽

Graphics Processing

With the development of the Big Data sphere, as well as those fields of study that we can relate to artificial intelligence, the need for fast and efficient computing has become one of the most important tasks nowadays. That is why in the recent decade, graphics processing unit computations have been actively developing to provide an ability for scientists and developers to use thousands of cores GPUs have in order to perform intensive computations. The goal of this research is to implement orthogonal decomposition of a matrix by applying a series of Householder transformations in Java language using JCuda library to conduct a research on its benefits. Several related papers were examined. Malaschonok and Savchenko in their work have introduced an improved version of QR algorithm for this purpose [4] and achieved better results, however Householder algorithm is more promising for GPUs according to another team of researchers – Lahabar and Narayanan [6]. However, they were using Float numbers, while we are using Double, and apart from that we are working on a new BigDecimal type for CUDA. Apart from that, there is still no solution for handling huge matrices where errors in calculations might occur. The algorithm of orthogonal matrix decomposition, which is the first part of SVD algorithm, is researched and implemented in this work. The implementation of matrix bidiagonalization and calculation of orthogonal factors by the Hausholder method in the jCUDA environment on a graphics processor is presented, and the algorithm for the central processor for comparisons is also implemented. Research of the received results where we experimentally measured acceleration of calculations with the use of the graphic processor in comparison with the implementation on the central processor are carried out. We show a speedup up to 53 times compared to CPU implementation on a big matrix size, specifically 2048, and even better results when using more advanced GPUs. At the same time, we still experience bigger errors in calculations while using graphic processing units due to synchronization problems. We compared execution on different platforms (Windows 10 and Arch Linux) and discovered that they are almost the same, taking the computation speed into account. The results have shown that on GPU we can achieve better performance, however there are more implementation difficulties with this approach.

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

NaUKMA Research Papers Computer Science ◽

10.18523/2617-3808.2021.4.16-22 ◽

2021 ◽

Vol 4 ◽

pp. 16-22

Author(s):

Mykola Semylitko ◽

Gennadii Malaschonok

Keyword(s):

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computation Time ◽

Application Programming Interface ◽

General Purpose ◽

Diagonal Matrix ◽

Free Access ◽

Processing Unit ◽

Cuda Architecture ◽

Graphics Processing

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.

Fast X-Ray Diffraction (XRD) Tomography for Enhanced Identification of Materials

10.36227/techrxiv.17125448 ◽

2021 ◽

Author(s):

Airidas Korolkovas ◽

Alexander Katsevich ◽

Michael Frenkel ◽

William Thompson ◽

Edward Morton

Keyword(s):

Graphics Processing Unit ◽

Low Cost ◽

Photon Counting ◽

Finite Size ◽

Processing Unit ◽

X Ray Diffraction ◽

X Ray ◽

Specific Material ◽

Xrd Patterns ◽

Graphics Processing

X-ray computed tomography (CT) can provide 3D images of density, and possibly the atomic number, for large objects like passenger luggage. This information, while generally very useful, is often insufficient to identify threats like explosives and narcotics, which can have a similar average composition as benign everyday materials such as plastics, glass, light metals, etc. A much more specific material signature can be measured with X-ray diffraction (XRD). Unfortunately, XRD signal is very faint compared to the transmitted one, and also challenging to reconstruct for objects larger than a small laboratory sample. In this article we analyze a novel low-cost scanner design which captures CT and XRD signals simultaneously, and uses the least possible collimation to maximize the flux. To simulate a realistic instrument, we derive a formula for the resolution of any diffraction pathway, taking into account the polychromatic spectrum, and the finite size of the source, detector, and each voxel. We then show how to reconstruct XRD patterns from a large phantom with multiple diffracting objects. Our approach includes a reasonable amount of photon counting noise (Poisson statistics), as well as measurement bias, in particular incoherent Compton scattering. The resolution of our reconstruction is sufficient to provide significantly more information than standard CT, thus increasing the accuracy of threat detection. Our theoretical model is implemented in GPU (Graphics Processing Unit) accelerated software which can be used to assess and further optimize scanner designs for specific applications in security, healthcare, and manufacturing quality control.

Scalable Graphics Processing Unit–Based Multiscale Linear Solvers for Reservoir Simulation

SPE Journal ◽

10.2118/203939-pa ◽

2021 ◽

pp. 1-20

Author(s):

A. M. Manea ◽

T. Almani

Keyword(s):

Shared Memory ◽

Reservoir Simulation ◽

Graphics Processing Unit ◽

Parallel Architecture ◽

Multiscale Methods ◽

Massively Parallel ◽

Processing Unit ◽

Multicore Architecture ◽

Graphics Processing ◽

Gpu Architecture

Summary In this work, the scalability of two key multiscale solvers for the pressure equation arising from incompressible flow in heterogeneous porous media, namely, the multiscale finite volume (MSFV) solver, and the restriction-smoothed basis multiscale (MsRSB) solver, are investigated on the graphics processing unit (GPU) massively parallel architecture. The robustness and scalability of both solvers are compared against their corresponding carefully optimized implementation on the shared-memory multicore architecture in a structured problem setting. Although several components in MSFV and MsRSB algorithms are directly parallelizable, their scalability on the GPU architecture depends heavily on the underlying algorithmic details and data-structure design of every step, where one needs to ensure favorable control and data flow on the GPU, while extracting enough parallel work for a massively parallel environment. In addition, the type of algorithm chosen for each step greatly influences the overall robustness of the solver. Thus, we extend the work on the parallel multiscale methods of Manea et al. (2016) to map the MSFV and MsRSB special kernels to the massively parallel GPU architecture. The scalability of our optimized parallel MSFV and MsRSB GPU implementations are demonstrated using highly heterogeneous structured 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. For both solvers, the multicore implementations are benchmarked on a shared-memory multicore architecture consisting of two packages of Intel® Cascade Lake Xeon Gold 6246 central processing unit (CPU), whereas the GPU implementations are benchmarked on a massively parallel architecture consisting of NVIDIA Volta V100 GPUs. We compare the multicore implementations to the GPU implementations for both the setup and solution stages. Finally, we compare the parallel MsRSB scalability to the scalability of MSFV on the multicore (Manea et al. 2016) and GPU architectures. To the best of our knowledge, this is the first parallel implementation and demonstration of these versatile multiscale solvers on the GPU architecture. NOTE: This paper is published as part of the 2021 SPE Reservoir Simulation Conference Special Issue.

Ridon Vehicle: Drive-by-Wire System for Scaled Vehicle Platform and Its Application on Behavior Cloning

Energies ◽

10.3390/en14238039 ◽

2021 ◽

Vol 14 (23) ◽

pp. 8039

Author(s):

Aws Khalil ◽

Ahmed Abdelhamed ◽

Girma Tewolde ◽

Jaerock Kwon

Keyword(s):

Machine Learning ◽

Graphics Processing Unit ◽

Cost Effective ◽

Autonomous Driving ◽

Full Scale ◽

Processing Unit ◽

Laptop Computer ◽

Sensor Package ◽

Wire System ◽

Vehicle Platform

For autonomous driving research, using a scaled vehicle platform is a viable alternative compared to a full-scale vehicle. However, using embedded solutions such as small robotic platforms with differential driving or radio-controlled (RC) car-based platforms can be limiting on, for example, sensor package restrictions or computing challenges. Furthermore, for a given controller, specialized expertise and abilities are necessary. To address such problems, this paper proposes a feasible solution, the Ridon vehicle, which is a spacious ride-on automobile with high-driving electric power and a custom-designed drive-by-wire system powered by a full-scale machine-learning-ready computer. The major objective of this paper is to provide a thorough and appropriate method for constructing a cost-effective platform with a drive-by-wire system and sensor packages so that machine-learning-based algorithms can be tested and deployed on a scaled vehicle. The proposed platform employs a modular and hierarchical software architecture, with microcontroller programs handling the low-level motor controls and a graphics processing unit (GPU)-powered laptop computer processing the higher and more sophisticated algorithms. The Ridon vehicle platform is validated by employing it in a deep-learning-based behavioral cloning study. The suggested platform’s affordability and adaptability would benefit broader research and the education community.

graphics processing unit
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Graphics processing unit-accelerated Monte Carlo simulation of polarized light in complex three-dimensional media

Fast and Accurate Background Reconstruction Using Background Bootstrapping

Genus-Physiognomy-Ecosystem Map with 88 Legends Produced at 10m-resolution First-time in a Country Scale through Machine Learning of Multi-temporal Satellite Images

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Fast X-Ray Diffraction (XRD) Tomography for Enhanced Identification of Materials

А Gpu-based Orthogonal Matrix Factorization Algorithm that Produces a Two-Diagonal Shape

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

Fast X-Ray Diffraction (XRD) Tomography for Enhanced Identification of Materials

Scalable Graphics Processing Unit–Based Multiscale Linear Solvers for Reservoir Simulation

Ridon Vehicle: Drive-by-Wire System for Scaled Vehicle Platform and Its Application on Behavior Cloning

Export Citation Format

graphics processing unitRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Graphics processing unit-accelerated Monte Carlo simulation of polarized light in complex three-dimensional media

Fast and Accurate Background Reconstruction Using Background Bootstrapping

Genus-Physiognomy-Ecosystem Map with 88 Legends Produced at 10m-resolution First-time in a Country Scale through Machine Learning of Multi-temporal Satellite Images

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Fast X-Ray Diffraction (XRD) Tomography for Enhanced Identification of Materials

А Gpu-based Orthogonal Matrix Factorization Algorithm that Produces a Two-Diagonal Shape

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

Fast X-Ray Diffraction (XRD) Tomography for Enhanced Identification of Materials

Scalable Graphics Processing Unit–Based Multiscale Linear Solvers for Reservoir Simulation

Ridon Vehicle: Drive-by-Wire System for Scaled Vehicle Platform and Its Application on Behavior Cloning

graphics processing unit
Recently Published Documents