gpu algorithms
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 8)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
pp. 102841
Author(s):  
Ahmad Abdelfattah ◽  
Valeria Barra ◽  
Natalie Beams ◽  
Ryan Bleile ◽  
Jed Brown ◽  
...  
Keyword(s):  

2021 ◽  
pp. 101339
Author(s):  
P. De Luca ◽  
A. Galletti ◽  
G. Giunta ◽  
L. Marcellino

2021 ◽  
Vol 251 ◽  
pp. 04017
Author(s):  
Bruno Alves ◽  
Andrea Bocci ◽  
Matti Kortelainen ◽  
Felice Pantaleo ◽  
Marco Rovere

We present the porting to heterogeneous architectures of the algorithm used for applying linear transformations of raw energy deposits in the CMS High Granularity Calorimeter (HGCAL). This is the first heterogeneous algorithm to be fully integrated with HGCAL’s reconstruction chain. After introducing the latter and giving a brief description of the structural components of HGCAL relevant for this work, the role of the linear transformations in the calibration is reviewed. The many ways in which parallelization is achieved are described, and the successful validation of the heterogeneous algorithm is covered. Detailed performance measurements are presented, including throughput and execution time for both CPU and GPU algorithms, therefore establishing the corresponding speedup. We finally discuss the interplay between this work and the porting of other algorithms in the existing reconstruction chain, as well as integrating algorithms previously ported but not yet integrated.


2019 ◽  
Author(s):  
Danny H.C. Kim ◽  
Lynne J. Williams ◽  
Moises Hernandez-Fernandez ◽  
Bruce H. Bjornson

AbstractBackgroundThe correct estimation of fibre orientations is a crucial step for reconstructing human brain tracts. A popular and extensively used tool for this estimation is Bayesian Estimation of Diffusion Parameters Obtained using Sampling Techniques (bedpostx), which is able to estimate several fibre orientations per voxel (i.e. crossing fibres) using Markov Chain Monte Carlo (MCMC). However, for fitting a model in a whole diffusion MRI dataset, MCMC can take up to a day to complete on a standard CPU. Recently, this algorithm has been ported to run on GPUs, which can accelerate the process, completing the analysis in minutes or hours. However, few studies have looked at whether the results from the CPU and GPU algorithms differ. In this study, we compared CPU and GPU bedpostx outputs by running multiple trials of both algorithms on the same whole brain diffusion data and compared each distribution of output using Kolmogorov-Smirnov tests.ResultsWe show that distributions of fibre fraction parameters and principal diffusion direction angles from bedpostx and bedpostx_gpu display few statistically significant differences in shape and are localized sparsely throughout the whole brain. Average output differences are small in magnitude compared to underlying uncertainty.ConclusionsDespite small amount of differences in samples created between CPU and GPU bedpostx algorithms, results are comparable given the difference in operation order and library usage between CPU and GPU bedpostx.


Author(s):  
Roberto Di Pietro ◽  
Leonardo Jero ◽  
Flavio Lombardi ◽  
Agusti Solanas
Keyword(s):  

Author(s):  
Vikram S. Mailthody ◽  
Ketan Date ◽  
Zaid Qureshi ◽  
Carl Pearson ◽  
Rakesh Nagi ◽  
...  

2018 ◽  
Author(s):  
Pablo José Pavan ◽  
Matheus da Silva Serpa ◽  
Víctor Martínez ◽  
Edson Luiz Padoin ◽  
Jairo Panetta ◽  
...  

Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, we improved the performance and achieved energy efficiency by the development of three different strategies which use the GPU memory subsystem (global-, shared-, and read-only- memory). We also develop two optimizations to use data locality and use of registers of GPU architecture. Our developed optimizations were applied to GPU algorithms for stencil applications achieve a performance improvement of up to 201:5% in K80 and 264:6% in P 100 when used shared memory and read-only cache respectively over the naive version. The computational results have shown that the combination of use read-only memory, the Z-axis internalization of stencil application and reuse of specific architecture registers allow increasing the energy efficiency of up to 255:6% in K80 and 314:8% in P 100.


Sign in / Sign up

Export Citation Format

Share Document