graphics processing unit
Recently Published Documents


TOTAL DOCUMENTS

952
(FIVE YEARS 257)

H-INDEX

36
(FIVE YEARS 7)

2022 ◽  
Author(s):  
Shijie Yan ◽  
Steven L Jacques ◽  
Jessica C. Ramella-Roman ◽  
Qianqian Fang

Significance: Monte Carlo (MC) methods have been applied for studying interactions between polarized light and biological tissues, but most existing MC codes supporting polarization modeling can only simulate homogeneous or multi-layered domains, resulting in approximations when handling realistic tissue structures. Aim: Over the past decade, the speed of MC simulations has seen dramatic improvement with massively-parallel computing techniques. Developing hardware-accelerated MC simulation algorithms that can accurately model polarized light inside 3-D heterogeneous tissues can greatly expand the utility of polarization in biophotonics applications. Approach: Here we report a highly efficient polarized MC algorithm capable of modeling arbitrarily complex media defined over a voxelated domain. Each voxel of the domain can be associated with spherical scatters of various radii and densities. The Stokes vector of each simulated photon packet is updated through photon propagation, creating spatially resolved polarization measurements over the detectors or domain surface. Results: We have implemented this algorithm in our widely disseminated MC simulator, Monte Carlo eXtreme (MCX). It is validated by comparing with a reference CPU-based simulator in both homogeneous and layered domains, showing excellent agreement and a 931-fold speedup. Conclusion: The polarization-enabled MCX (pMCX) offers biophotonics community an efficient tool to explore polarized light in bio-tissues, and is freely available at http://mcx.space/.


2022 ◽  
Vol 8 (1) ◽  
pp. 9
Author(s):  
Bruno Sauvalle ◽  
Arnaud de La Fortelle

The goal of background reconstruction is to recover the background image of a scene from a sequence of frames showing this scene cluttered by various moving objects. This task is fundamental in image analysis, and is generally the first step before more advanced processing, but difficult because there is no formal definition of what should be considered as background or foreground and the results may be severely impacted by various challenges such as illumination changes, intermittent object motions, highly cluttered scenes, etc. We propose in this paper a new iterative algorithm for background reconstruction, where the current estimate of the background is used to guess which image pixels are background pixels and a new background estimation is performed using those pixels only. We then show that the proposed algorithm, which uses stochastic gradient descent for improved regularization, is more accurate than the state of the art on the challenging SBMnet dataset, especially for short videos with low frame rates, and is also fast, reaching an average of 52 fps on this dataset when parameterized for maximal accuracy using acceleration with a graphics processing unit (GPU) and a Python implementation.


Author(s):  
Ram C. Sharma ◽  
Keitarou Hara

This research introduces Genus-Physiognomy-Ecosystem (GPE) mapping at a prefecture level through machine learning of multi-spectral and multi-temporal satellite images at 10m spatial resolution, and later integration of prefecture wise maps into country scale for dealing with 88 GPE types to be classified from a large size of training data involved in the research effectively. This research was made possible by harnessing entire archives of Level-2A product, Bottom of Atmosphere reflectance images collected by MultiSpectral Instruments onboard a constellation of two polar-orbiting Sentinel-2 mission satellites. The satellite images were pre-processed for cloud masking and monthly median composite images consisting of 10 multi-spectral bands and 7 spectral indexes were generated. The ground truth labels were extracted from extant vegetation survey maps by implementing systematic stratified sampling approach and noisy labels were dropped out for preparing a reliable ground truth database. Graphics Processing Unit (GPU) implementation of Gradient Boosting Decision Trees (GBDT) classifier was employed for classification of 88 GPE types from 204 satellite features. The classification accuracy computed with 25% test data varied from 65-81% in terms of F1-score across 48 prefectural regions. This research produced seamless maps of 88 GPE types first time at a country scale with an average 72% F1-score.


Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2021 ◽  
Author(s):  
Airidas Korolkovas ◽  
Alexander Katsevich ◽  
Michael Frenkel ◽  
William Thompson ◽  
Edward Morton

X-ray computed tomography (CT) can provide 3D images of density, and possibly the atomic number, for large objects like passenger luggage. This information, while generally very useful, is often insufficient to identify threats like explosives and narcotics, which can have a similar average composition as benign everyday materials such as plastics, glass, light metals, etc. A much more specific material signature can be measured with X-ray diffraction (XRD). Unfortunately, XRD signal is very faint compared to the transmitted one, and also challenging to reconstruct for objects larger than a small laboratory sample. In this article we analyze a novel low-cost scanner design which captures CT and XRD signals simultaneously, and uses the least possible collimation to maximize the flux. To simulate a realistic instrument, we derive a formula for the resolution of any diffraction pathway, taking into account the polychromatic spectrum, and the finite size of the source, detector, and each voxel. We then show how to reconstruct XRD patterns from a large phantom with multiple diffracting objects. Our approach includes a reasonable amount of photon counting noise (Poisson statistics), as well as measurement bias, in particular incoherent Compton scattering. The resolution of our reconstruction is sufficient to provide significantly more information than standard CT, thus increasing the accuracy of threat detection. Our theoretical model is implemented in GPU (Graphics Processing Unit) accelerated software which can be used to assess and further optimize scanner designs for specific applications in security, healthcare, and manufacturing quality control.


2021 ◽  
Vol 4 ◽  
pp. 10-15
Author(s):  
Gennadii Malaschonok ◽  
Serhii Sukharskyi

With the development of the Big Data sphere, as well as those fields of study that we can relate to artificial intelligence, the need for fast and efficient computing has become one of the most important tasks nowadays. That is why in the recent decade, graphics processing unit computations have been actively developing to provide an ability for scientists and developers to use thousands of cores GPUs have in order to perform intensive computations. The goal of this research is to implement orthogonal decomposition of a matrix by applying a series of Householder transformations in Java language using JCuda library to conduct a research on its benefits. Several related papers were examined. Malaschonok and Savchenko in their work have introduced an improved version of QR algorithm for this purpose [4] and achieved better results, however Householder algorithm is more promising for GPUs according to another team of researchers – Lahabar and Narayanan [6]. However, they were using Float numbers, while we are using Double, and apart from that we are working on a new BigDecimal type for CUDA. Apart from that, there is still no solution for handling huge matrices where errors in calculations might occur. The algorithm of orthogonal matrix decomposition, which is the first part of SVD algorithm, is researched and implemented in this work. The implementation of matrix bidiagonalization and calculation of orthogonal factors by the Hausholder method in the jCUDA environment on a graphics processor is presented, and the algorithm for the central processor for comparisons is also implemented. Research of the received results where we experimentally measured acceleration of calculations with the use of the graphic processor in comparison with the implementation on the central processor are carried out. We show a speedup up to 53 times compared to CPU implementation on a big matrix size, specifically 2048, and even better results when using more advanced GPUs. At the same time, we still experience bigger errors in calculations while using graphic processing units due to synchronization problems. We compared execution on different platforms (Windows 10 and Arch Linux) and discovered that they are almost the same, taking the computation speed into account. The results have shown that on GPU we can achieve better performance, however there are more implementation difficulties with this approach.


2021 ◽  
Vol 4 ◽  
pp. 16-22
Author(s):  
Mykola Semylitko ◽  
Gennadii Malaschonok

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.


2021 ◽  
Author(s):  
Airidas Korolkovas ◽  
Alexander Katsevich ◽  
Michael Frenkel ◽  
William Thompson ◽  
Edward Morton

X-ray computed tomography (CT) can provide 3D images of density, and possibly the atomic number, for large objects like passenger luggage. This information, while generally very useful, is often insufficient to identify threats like explosives and narcotics, which can have a similar average composition as benign everyday materials such as plastics, glass, light metals, etc. A much more specific material signature can be measured with X-ray diffraction (XRD). Unfortunately, XRD signal is very faint compared to the transmitted one, and also challenging to reconstruct for objects larger than a small laboratory sample. In this article we analyze a novel low-cost scanner design which captures CT and XRD signals simultaneously, and uses the least possible collimation to maximize the flux. To simulate a realistic instrument, we derive a formula for the resolution of any diffraction pathway, taking into account the polychromatic spectrum, and the finite size of the source, detector, and each voxel. We then show how to reconstruct XRD patterns from a large phantom with multiple diffracting objects. Our approach includes a reasonable amount of photon counting noise (Poisson statistics), as well as measurement bias, in particular incoherent Compton scattering. The resolution of our reconstruction is sufficient to provide significantly more information than standard CT, thus increasing the accuracy of threat detection. Our theoretical model is implemented in GPU (Graphics Processing Unit) accelerated software which can be used to assess and further optimize scanner designs for specific applications in security, healthcare, and manufacturing quality control.


SPE Journal ◽  
2021 ◽  
pp. 1-20
Author(s):  
A. M. Manea ◽  
T. Almani

Summary In this work, the scalability of two key multiscale solvers for the pressure equation arising from incompressible flow in heterogeneous porous media, namely, the multiscale finite volume (MSFV) solver, and the restriction-smoothed basis multiscale (MsRSB) solver, are investigated on the graphics processing unit (GPU) massively parallel architecture. The robustness and scalability of both solvers are compared against their corresponding carefully optimized implementation on the shared-memory multicore architecture in a structured problem setting. Although several components in MSFV and MsRSB algorithms are directly parallelizable, their scalability on the GPU architecture depends heavily on the underlying algorithmic details and data-structure design of every step, where one needs to ensure favorable control and data flow on the GPU, while extracting enough parallel work for a massively parallel environment. In addition, the type of algorithm chosen for each step greatly influences the overall robustness of the solver. Thus, we extend the work on the parallel multiscale methods of Manea et al. (2016) to map the MSFV and MsRSB special kernels to the massively parallel GPU architecture. The scalability of our optimized parallel MSFV and MsRSB GPU implementations are demonstrated using highly heterogeneous structured 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. For both solvers, the multicore implementations are benchmarked on a shared-memory multicore architecture consisting of two packages of Intel® Cascade Lake Xeon Gold 6246 central processing unit (CPU), whereas the GPU implementations are benchmarked on a massively parallel architecture consisting of NVIDIA Volta V100 GPUs. We compare the multicore implementations to the GPU implementations for both the setup and solution stages. Finally, we compare the parallel MsRSB scalability to the scalability of MSFV on the multicore (Manea et al. 2016) and GPU architectures. To the best of our knowledge, this is the first parallel implementation and demonstration of these versatile multiscale solvers on the GPU architecture. NOTE: This paper is published as part of the 2021 SPE Reservoir Simulation Conference Special Issue.


Energies ◽  
2021 ◽  
Vol 14 (23) ◽  
pp. 8039
Author(s):  
Aws Khalil ◽  
Ahmed Abdelhamed ◽  
Girma Tewolde ◽  
Jaerock Kwon

For autonomous driving research, using a scaled vehicle platform is a viable alternative compared to a full-scale vehicle. However, using embedded solutions such as small robotic platforms with differential driving or radio-controlled (RC) car-based platforms can be limiting on, for example, sensor package restrictions or computing challenges. Furthermore, for a given controller, specialized expertise and abilities are necessary. To address such problems, this paper proposes a feasible solution, the Ridon vehicle, which is a spacious ride-on automobile with high-driving electric power and a custom-designed drive-by-wire system powered by a full-scale machine-learning-ready computer. The major objective of this paper is to provide a thorough and appropriate method for constructing a cost-effective platform with a drive-by-wire system and sensor packages so that machine-learning-based algorithms can be tested and deployed on a scaled vehicle. The proposed platform employs a modular and hierarchical software architecture, with microcontroller programs handling the low-level motor controls and a graphics processing unit (GPU)-powered laptop computer processing the higher and more sophisticated algorithms. The Ridon vehicle platform is validated by employing it in a deep-learning-based behavioral cloning study. The suggested platform’s affordability and adaptability would benefit broader research and the education community.


Sign in / Sign up

Export Citation Format

Share Document