gpu architectures Latest Research Papers

Abstract The problem under consideration consists in choosing the number of k individuals, so that the time for processing k individuals by the genetic algorithm (GA) on the CPU architecture is close to the time for processing l individuals on the GPU architecture by the genetic algorithm. The initial information is data arrays containing information about the processing time of a given number of individuals by the genetic algorithm on the available hardware architectures. Fuzzy numbers are determined based on these arrays?~? and?~?, describing the processing time of a given number of individuals, respectively, on the CPU and GPU architectures. The peculiarities of the subject area do not allow considering the well-known methods of comparison based on the equalities of the membership functions and the nearest clear sets as adequate. Based on the known formula “close to Y (around Y)” the way to compare fuzzy numbers?~? and?~? was developed in order to determine the degree of closeness of the processing time of k and l individuals, respectively, on the hardware architectures of the CPU and GPU.

Download Full-text

Performance optimization of the MGB hydrological model for multi-core and GPU architectures

Environmental Modelling & Software ◽

10.1016/j.envsoft.2021.105271 ◽

2021 ◽

pp. 105271

Author(s):

Henrique R.A. Freitas ◽

Celso L. Mendes ◽

Aleksandar Ilic

Keyword(s):

Performance Optimization ◽

Hydrological Model ◽

Gpu Architectures

Download Full-text

Comparing LLC-Memory Traffic between CPU and GPU Architectures

10.1109/rsdha54838.2021.00007 ◽

2021 ◽

Author(s):

Mohammad Alaul Haque Monil ◽

Seyong Lee ◽

Jeffrey S. Vetter ◽

Allen D. Malony

Keyword(s):

Gpu Architectures

Download Full-text

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

Data in Brief ◽

10.1016/j.dib.2021.107631 ◽

2021 ◽

pp. 107631

Author(s):

Jana Hozzová ◽

Jiří Filipovič ◽

Amin Nezarat ◽

Jaroslav Ol’ha ◽

Filip Petrovič

Keyword(s):

Performance Counters ◽

Hardware Performance Counters ◽

Gpu Architectures ◽

Hardware Performance

Download Full-text

A Massively Parallel Restriction-Smoothed Basis Multiscale Solver on Multi-Core and GPU Architectures

10.2118/203939-ms ◽

2021 ◽

Author(s):

Abdulrahman Manea

Keyword(s):

Shared Memory ◽

Parallel Implementation ◽

Real Life ◽

Parallel Architecture ◽

Industrial Applications ◽

Multiscale Methods ◽

Basis Functions ◽

Massively Parallel ◽

Gpu Architectures ◽

Gpu Implementation

Abstract Due to its simplicity, adaptability, and applicability to various grid formats, the restriction-smoothed basis multiscale method (MsRSB) (Møyne and Lie 2016) has received wide attention and has been extended to various flow problems in porous media. Unlike the standard multiscale methods, MsRSB relies on iterative smoothing to find the multiscale basis functions in an adaptive manner, giving it the ability to naturally adjust to various complex grid orientations often encountered in real-life industrial applications. In this work, we investigate the scalability of MsRSB on various state-of-the-art parallel architectures, including multi-core systems and GPUs. While MsRSB is — like most other multiscale methods — directly amenable to parallelization, the dependence on a smoother to find the basis functions creates unique control- and data-flow patterns. These patterns require careful design and implementation in parallel environments to achieve good scalability. We extend the work on parallel multiscale methods in Manea et al. (2016) and Manea and Almani (2019) to map the MsRSB special kernels to the shared-memory parallel multi-core and GPU architectures. The scalability of our optimized parallel MsRSB implementation is demonstrated using highly heterogeneous 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. The multi-core implementation is benchmarked on a shared memory multi-core architecture consisting of two packages of Intel's Cascade Lake Xeon® Gold 6246 CPU, while the GPU implementation is benchmarked on a massively parallel architecture consisting of Nvidia Volta V100 GPUs. We compare the multi-core implementation to the GPU implementation for both the setup and solution stages. To the best of our knowledge, this is the first parallel implementation and demonstration of the versatile MsRSB method on the GPU architecture.

Download Full-text

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications

The International Journal of High Performance Computing Applications ◽

10.1177/10943420211029299 ◽

2021 ◽

pp. 109434202110292

Author(s):

Seher Acer ◽

Ariful Azad ◽

Erik G Boman ◽

Aydın Buluç ◽

Karen D. Devine ◽

...

Keyword(s):

Graph Algorithms ◽

Scientific Discovery ◽

Combinatorial Algorithms ◽

Science And Engineering ◽

Memory Hierarchies ◽

Computing Systems ◽

Complementary Role ◽

Computational Science And Engineering ◽

Gpu Architectures ◽

Combinatorial Methods

Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to implement on parallel systems. With tens of billions of hardware threads and deep memory hierarchies, the exascale computing systems in particular pose extreme challenges in scaling graph algorithms. The codesign center on combinatorial algorithms, ExaGraph, was established to design and develop methods and techniques for efficient implementation of key combinatorial (graph) algorithms chosen from a diverse set of exascale applications. Algebraic and combinatorial methods have a complementary role in the advancement of computational science and engineering, including playing an enabling role on each other. In this paper, we survey the algorithmic and software development activities performed under the auspices of ExaGraph from both a combinatorial and an algebraic perspective. In particular, we detail our recent efforts in porting the algorithms to manycore accelerator (GPU) architectures. We also provide a brief survey of the applications that have benefited from the scalable implementations of different combinatorial algorithms to enable scientific discovery at scale. We believe that several applications will benefit from the algorithmic and software tools developed by the ExaGraph team.

Download Full-text

Parallel and accurate k ‐means algorithm on CPU‐GPU architectures for spectral clustering

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6621 ◽

2021 ◽

Author(s):

Guanlin He ◽

Stephane Vialle ◽

Marc Baboulin

Keyword(s):

Spectral Clustering ◽

Gpu Architectures

Download Full-text

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

10.21203/rs.3.rs-863369/v1 ◽

2021 ◽

Author(s):

Alexssandro Ferreira Cordeiro ◽

Pedro Luiz de Paula Filho ◽

Hamilton Pereira Silva ◽

Arnaldo Candido Junior ◽

Edresson Casanova ◽

...

Keyword(s):

Parallel Programming ◽

Processing Time ◽

Data Type ◽

Massively Parallel ◽

Data Types ◽

Sequential Approach ◽

Time Performance ◽

Sequential Programming ◽

Gpu Architectures ◽

2D Images

Abstract Purpose: analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming methodologies. Material and methods: for image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C# with the IDE Visual studio. Results: the results of the comparisons indicate that the form of sequential programming in a CPU generates reliable images at a high custom of time when compared to the forms of parallel programming in CPU and GPU. While parallel programming generates faster results, but with increased noise in the reconstructed image. For data types float a GPU obtained best result with average time equivalent to 1/3 of the processor, however the data is of type double the parallel CPU approach obtained the best performance. Conclusion: for the float data type, the GPU had the best average time performance, while for the double data type the best average time performance was for the parallel approach CPU. Regarding image quality, the sequential approach obtained similar outputs, while theparallel approaches generated noise in their outputs.

Download Full-text

Immortal rays: Rethinking random ray neutron transport on GPU architectures

Parallel Computing ◽

10.1016/j.parco.2021.102832 ◽

2021 ◽

pp. 102832

Author(s):

John R. Tramm ◽

Andrew R. Siegel

Keyword(s):

Neutron Transport ◽

Gpu Architectures

Download Full-text

gpu architectures
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evaluating low-level software-based hardening techniques for configurable GPU architectures

The use of fuzzy sets to determine the parameters of genetic algorithms that provide approximately the same execution time on the CPU and GPU

Performance optimization of the MGB hydrological model for multi-core and GPU architectures

Comparing LLC-Memory Traffic between CPU and GPU Architectures

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

A Massively Parallel Restriction-Smoothed Basis Multiscale Solver on Multi-Core and GPU Architectures

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications

Parallel and accurate k ‐means algorithm on CPU‐GPU architectures for spectral clustering

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

Immortal rays: Rethinking random ray neutron transport on GPU architectures

Export Citation Format

gpu architecturesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evaluating low-level software-based hardening techniques for configurable GPU architectures

The use of fuzzy sets to determine the parameters of genetic algorithms that provide approximately the same execution time on the CPU and GPU

Performance optimization of the MGB hydrological model for multi-core and GPU architectures

Comparing LLC-Memory Traffic between CPU and GPU Architectures

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

A Massively Parallel Restriction-Smoothed Basis Multiscale Solver on Multi-Core and GPU Architectures

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications

Parallel and accurate k ‐means algorithm on CPU‐GPU architectures for spectral clustering

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

Immortal rays: Rethinking random ray neutron transport on GPU architectures

gpu architectures
Recently Published Documents