gpu processing
Recently Published Documents


TOTAL DOCUMENTS

42
(FIVE YEARS 7)

H-INDEX

9
(FIVE YEARS 1)

Author(s):  
S. Saupi Teri ◽  
I. A. Musliman ◽  
A. Abdul Rahman

Abstract. The expansion of data collection from remote sensing and other geographic data sources, as well as from other technology such as cloud, sensors, mobile, and social media, have made mapping and analysis more complex. Some geospatial applications continue to rely on conventional geospatial processing, where limitation on computation capabilities often lacking to attain significant data interpretation. In recent years, GPU processing has improved far more GIS applications than using CPU alone. As a result, numerous researchers have begun utilising GPUs for scientific, geometric, and database computations in addition to graphics hardware use. This paper summarizes parallel processing concept and architecture, the development of GPU geoprocessing for big geodata ranging from remote sensing and 3D modelling to smart cities studies. This paper also addresses the GPU future trends advancement opportunities with other technologies, machine learning, deep learning, and cloud-based computing.


Energies ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 72
Author(s):  
Stanisława Porzycka-Strzelczyk ◽  
Jacek Strzelczyk ◽  
Kamil Szostek ◽  
Maciej Dwornik ◽  
Andrzej Leśniak ◽  
...  

The main goal of this research was to propose a new method of polarimetric SAR data decomposition that will extract additional polarimetric information from the Synthetic Aperture Radar (SAR) images compared to other existing decomposition methods. Most of the current decomposition methods are based on scattering, covariance or coherence matrices describing the radar wave-scattering phenomenon represented in a single pixel of an SAR image. A lot of different decomposition methods have been proposed up to now, but the problem is still open since it has no unique solution. In this research, a new polarimetric decomposition method is proposed that is based on polarimetric signature matrices. Such matrices may be used to reveal hidden information about the image target. Since polarimetric signatures (size 18 × 9) are much larger than scattering (size 2 × 2), covariance (size 3 × 3 or 4 × 4) or coherence (size 3 × 3 or 4 × 4) matrices, it was essential to use appropriate computational tools to calculate the results of the proposed decomposition method within an acceptable time frame. In order to estimate the effectiveness of the presented method, the obtained results were compared with the outcomes of another method of decomposition (Arii decomposition). The conducted research showed that the proposed solution, compared with Arii decomposition, does not overestimate the volume-scattering component in built-up areas and clearly separates objects within the mixed-up areas, where both building, vegetation and surfaces occur.


2021 ◽  
Vol 251 ◽  
pp. 04026
Author(s):  
David Rohr

ALICE will significantly increase its Pb–Pb data taking rate from the 1 kHz of triggered readout in Run 2 to 50 kHz of continuous readout for LHC Run 3. Updated tracking detectors are installed for Run 3 and a new twophase computing strategy is employed. In the first synchronous phase during the data taking, the raw data is compressed for storage to an on-site disk buffer and the required data for the detector calibration is collected. In the second asynchronous phase the compressed raw data is reprocessed using the final calibration to produce the final reconstruction output. Traditional CPUs are unable to cope with the huge data rate and processing demands of the synchronous phase, therefore ALICE employs GPUs to speed up the processing. Since the online computing farm performs a part of the asynchronous processing when there is no beam in the LHC, ALICE plans to use the GPUs also for this second phase. This paper gives an overview of the GPU processing in the synchronous phase, the full system test to validate the reference GPU architecture, and the prospects for the GPU usage in the asynchronous phase.


Author(s):  
Ricardo Nobre ◽  
Sergio Santander-Jiménez ◽  
Leonel Sousa ◽  
Aleksandar Ilic
Keyword(s):  

2020 ◽  
Vol 11 (1) ◽  
pp. 83-94
Author(s):  
Kirankumar V Kataraki ◽  
Satyadhyan Chickerur

The aim of moving particle semi-implicit (MPS) is to simulate the incompressible flow of fluids in free surface. MPS, when implemented, consumes a lot of time and thus, needs a very powerful computing system. Instead of using parallel computing system, the performance level of the MPS model can be improved by using graphics processing units (GPUs). The aim is to have a computing system that is capable of performing at high levels thereby enhancing the speed of processing the numerical computations required in MPS. The primary aim of the study is to build a GPU-accelerated MPS model using CUDA aimed at reducing the time taken to perform the search for neighboring particles. In order to increase the GPU processing speed, specific consideration is given towards the optimization of a neighboring particle search process. The numerical model of MPS is performed using the governing equations, notably the Navier-Stokes equation. The simulation model indicates that using GPU based MPS produce better performance compared to the traditional arrangement of using CPUs.


2019 ◽  
Vol 6 (1) ◽  
pp. 9-17
Author(s):  
Johannes Götze

In today's algorithms for sound localization techniques, matrix calculations are ubiquitous. Therefore, this work deals with the analysis of matrix calculations and their possible realization on embedded systems. For this purpose, common acceleration technologies such as processors, GPU processing and acceleration with the help of FPGAs are compared. The results show that a graphics chip is capable to accelerate such a matrix vector multiplication compared to an implementation on a processor. Therefore a runtime of an implementation on an FPGA cannot be achieved by a GPU.


2019 ◽  
Author(s):  
Min Guo ◽  
Yue Li ◽  
Yijun Su ◽  
Talley Lambert ◽  
Damian Dalle Nogare ◽  
...  

AbstractWe describe theoretical and practical advances in algorithm and software design, resulting in ten to several thousand-fold faster deconvolution and multiview fusion than previous methods. First, we adapt methods from medical imaging, showing that an unmatched back projector accelerates Richardson-Lucy deconvolution by at least 10-fold, in most cases requiring only a single iteration. Second, we show that improvements in 3D image-based registration with GPU processing result in speedups of 10-100-fold over CPU processing. Third, we show that deep learning can provide further accelerations, particularly for deconvolution with a spatially varying point spread function. We illustrate the power of our methods from the subcellular to millimeter spatial scale, on diverse samples including single cells, nematode and zebrafish embryos, and cleared mouse tissue. Finally, we show that our methods facilitate the use of new microscopes that improve spatial resolution, including dual-view cleared tissue light-sheet microscopy and reflective lattice light-sheet microscopy.


2018 ◽  
Vol 19 (4) ◽  
pp. 401-422
Author(s):  
Tomasz Gajger ◽  
Pawel Czarnul

In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy.We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed systems.The resulting solution enables the user to express a CUDA application in a MERPSYS editor using an extended Java language and then conveniently evaluate its performance for various launch configurations using different hardware units.We also provide a systematic methodology for extracting kernel characteristics, that are used as input parameters of the model.The model was evaluated using kernels representing different traits and for a large variety of launch configurations.We found it to be very accurate for computation bound kernels and realistic workloads, whilst for memory throughput bound kernels and uncommon scenarios the results were still within acceptable limits.We have also proven its portability between two devices of the same hardware architecture but different processing power.Consequently, MERPSYS with the theoretical models embedded in it can be used for evaluationof application performance on various GPUs and used for performance prediction and e.g. purchase decision making.


Sign in / Sign up

Export Citation Format

Share Document