gpu algorithms Latest Research Papers

We present the porting to heterogeneous architectures of the algorithm used for applying linear transformations of raw energy deposits in the CMS High Granularity Calorimeter (HGCAL). This is the first heterogeneous algorithm to be fully integrated with HGCAL’s reconstruction chain. After introducing the latter and giving a brief description of the structural components of HGCAL relevant for this work, the role of the linear transformations in the calibration is reviewed. The many ways in which parallelization is achieved are described, and the successful validation of the heterogeneous algorithm is covered. Detailed performance measurements are presented, including throughput and execution time for both CPU and GPU algorithms, therefore establishing the corresponding speedup. We finally discuss the interplay between this work and the porting of other algorithms in the existing reconstruction chain, as well as integrating algorithms previously ported but not yet integrated.

Download Full-text

GPU algorithms for density matrix methods on MOPAC: linear scaling electronic structure calculations for large molecular systems

Journal of Molecular Modeling ◽

10.1007/s00894-020-04571-6 ◽

2020 ◽

Vol 26 (11) ◽

Author(s):

Julio Daniel Carvalho Maia ◽

Lucidio dos Anjos Formiga Cabral ◽

Gerd Bruno Rocha

Keyword(s):

Electronic Structure ◽

Density Matrix ◽

Electronic Structure Calculations ◽

Linear Scaling ◽

Matrix Methods ◽

Molecular Systems ◽

Gpu Algorithms ◽

Structure Calculations

Download Full-text

Performance Comparison on Parallel CPU and GPU Algorithms for Two Dimensional Unified Gas-Kinetic Scheme

Advances in Applied Mathematics and Mechanics ◽

10.4208/aamm.oa-2019-0147 ◽

2020 ◽

Vol 12 (5) ◽

pp. 1247-1260

Author(s):

global sci

Keyword(s):

Kinetic Scheme ◽

Performance Comparison ◽

Two Dimensional ◽

Gpu Algorithms ◽

Unified Gas Kinetic Scheme

Download Full-text

Comparison of CPU and GPU Bayesian Estimates of Fibre Orientations from Diffusion MRI

10.1101/703835 ◽

2019 ◽

Author(s):

Danny H.C. Kim ◽

Lynne J. Williams ◽

Moises Hernandez-Fernandez ◽

Bruce H. Bjornson

Keyword(s):

Diffusion Mri ◽

Average Output ◽

Whole Brain ◽

Diffusion Parameters ◽

Kolmogorov Smirnov ◽

Multiple Trials ◽

Gpu Algorithms ◽

The Difference ◽

Correct Estimation ◽

Fibre Fraction

AbstractBackgroundThe correct estimation of fibre orientations is a crucial step for reconstructing human brain tracts. A popular and extensively used tool for this estimation is Bayesian Estimation of Diffusion Parameters Obtained using Sampling Techniques (bedpostx), which is able to estimate several fibre orientations per voxel (i.e. crossing fibres) using Markov Chain Monte Carlo (MCMC). However, for fitting a model in a whole diffusion MRI dataset, MCMC can take up to a day to complete on a standard CPU. Recently, this algorithm has been ported to run on GPUs, which can accelerate the process, completing the analysis in minutes or hours. However, few studies have looked at whether the results from the CPU and GPU algorithms differ. In this study, we compared CPU and GPU bedpostx outputs by running multiple trials of both algorithms on the same whole brain diffusion data and compared each distribution of output using Kolmogorov-Smirnov tests.ResultsWe show that distributions of fibre fraction parameters and principal diffusion direction angles from bedpostx and bedpostx_gpu display few statistically significant differences in shape and are localized sparsely throughout the whole brain. Average output differences are small in magnitude compared to underlying uncertainty.ConclusionsDespite small amount of differences in samples created between CPU and GPU bedpostx algorithms, results are comparable given the difference in operation order and library usage between CPU and GPU bedpostx.

Download Full-text

GPU Algorithms for K-Anonymity in Microdata

2019 IEEE Conference on Communications and Network Security (CNS) ◽

10.1109/cns.2019.8802735 ◽

2019 ◽

Author(s):

Roberto Di Pietro ◽

Leonardo Jero ◽

Flavio Lombardi ◽

Agusti Solanas

Keyword(s):

Gpu Algorithms

Download Full-text

Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations

SIAM Journal on Scientific Computing ◽

10.1137/18m1210101 ◽

2019 ◽

Vol 41 (4) ◽

pp. C339-C366 ◽

Cited By ~ 1

Author(s):

Wajih Boukaram ◽

George Turkiyyah ◽

David Keyes

Keyword(s):

Hierarchical Matrices ◽

Gpu Algorithms ◽

Matrix Vector

Download Full-text

Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition

2018 IEEE High Performance extreme Computing Conference (HPEC) ◽

10.1109/hpec.2018.8547517 ◽

2018 ◽

Cited By ~ 6

Author(s):

Vikram S. Mailthody ◽

Ketan Date ◽

Zaid Qureshi ◽

Carl Pearson ◽

Rakesh Nagi ◽

...

Keyword(s):

Triangle Counting ◽

Gpu Algorithms

Download Full-text

Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

10.5753/wperformance.2018.3348 ◽

2018 ◽

Author(s):

Pablo José Pavan ◽

Matheus da Silva Serpa ◽

Víctor Martínez ◽

Edson Luiz Padoin ◽

Jairo Panetta ◽

...

Keyword(s):

Energy Efficiency ◽

Energy Efficient ◽

Large Scale ◽

Data Locality ◽

Large Scale Systems ◽

Systems Research ◽

Gpu Algorithms ◽

Read Only Memory ◽

And Performance ◽

Gpu Architecture

Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, we improved the performance and achieved energy efficiency by the development of three different strategies which use the GPU memory subsystem (global-, shared-, and read-only- memory). We also develop two optimizations to use data locality and use of registers of GPU architecture. Our developed optimizations were applied to GPU algorithms for stencil applications achieve a performance improvement of up to 201:5% in K80 and 264:6% in P 100 when used shared memory and read-only cache respectively over the naive version. The computational results have shown that the combination of use read-only memory, the Z-axis internalization of stencil application and reuse of specific architecture registers allow increasing the energy efficiency of up to 255:6% in K80 and 314:8% in P 100.

Download Full-text

gpu algorithms
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

GPU algorithms for Efficient Exascale Discretizations

Recursive Filter based GPU algorithms in a Data Assimilation scenario

Heterogeneous techniques for rescaling energy deposits in the CMS Phase-2 endcap calorimeter

GPU algorithms for density matrix methods on MOPAC: linear scaling electronic structure calculations for large molecular systems

Performance Comparison on Parallel CPU and GPU Algorithms for Two Dimensional Unified Gas-Kinetic Scheme

Comparison of CPU and GPU Bayesian Estimates of Fibre Orientations from Diffusion MRI

GPU Algorithms for K-Anonymity in Microdata

Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations

Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition

Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

Export Citation Format

gpu algorithmsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

GPU algorithms for Efficient Exascale Discretizations

Recursive Filter based GPU algorithms in a Data Assimilation scenario

Heterogeneous techniques for rescaling energy deposits in the CMS Phase-2 endcap calorimeter

GPU algorithms for density matrix methods on MOPAC: linear scaling electronic structure calculations for large molecular systems

Performance Comparison on Parallel CPU and GPU Algorithms for Two Dimensional Unified Gas-Kinetic Scheme

Comparison of CPU and GPU Bayesian Estimates of Fibre Orientations from Diffusion MRI

GPU Algorithms for K-Anonymity in Microdata

Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations

Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition

Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

gpu algorithms
Recently Published Documents