gpu architectures
Recently Published Documents


TOTAL DOCUMENTS

242
(FIVE YEARS 76)

H-INDEX

23
(FIVE YEARS 3)

Author(s):  
Marcio M. Goncalves ◽  
Josie E. Rodriguez Condia ◽  
Matteo Sonza Reorda ◽  
Luca Sterpone ◽  
Jose Rodrigo Azambuja
Keyword(s):  

2021 ◽  
Vol 2131 (3) ◽  
pp. 032025
Author(s):  
Oleg Agibalov ◽  
Nikolay Ventsov

Abstract The problem under consideration consists in choosing the number of k individuals, so that the time for processing k individuals by the genetic algorithm (GA) on the CPU architecture is close to the time for processing l individuals on the GPU architecture by the genetic algorithm. The initial information is data arrays containing information about the processing time of a given number of individuals by the genetic algorithm on the available hardware architectures. Fuzzy numbers are determined based on these arrays?~? and?~?, describing the processing time of a given number of individuals, respectively, on the CPU and GPU architectures. The peculiarities of the subject area do not allow considering the well-known methods of comparison based on the equalities of the membership functions and the nearest clear sets as adequate. Based on the known formula “close to Y (around Y)” the way to compare fuzzy numbers?~? and?~? was developed in order to determine the degree of closeness of the processing time of k and l individuals, respectively, on the hardware architectures of the CPU and GPU.


2021 ◽  
Author(s):  
Mohammad Alaul Haque Monil ◽  
Seyong Lee ◽  
Jeffrey S. Vetter ◽  
Allen D. Malony
Keyword(s):  

2021 ◽  
Author(s):  
Abdulrahman Manea

Abstract Due to its simplicity, adaptability, and applicability to various grid formats, the restriction-smoothed basis multiscale method (MsRSB) (Møyne and Lie 2016) has received wide attention and has been extended to various flow problems in porous media. Unlike the standard multiscale methods, MsRSB relies on iterative smoothing to find the multiscale basis functions in an adaptive manner, giving it the ability to naturally adjust to various complex grid orientations often encountered in real-life industrial applications. In this work, we investigate the scalability of MsRSB on various state-of-the-art parallel architectures, including multi-core systems and GPUs. While MsRSB is — like most other multiscale methods — directly amenable to parallelization, the dependence on a smoother to find the basis functions creates unique control- and data-flow patterns. These patterns require careful design and implementation in parallel environments to achieve good scalability. We extend the work on parallel multiscale methods in Manea et al. (2016) and Manea and Almani (2019) to map the MsRSB special kernels to the shared-memory parallel multi-core and GPU architectures. The scalability of our optimized parallel MsRSB implementation is demonstrated using highly heterogeneous 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. The multi-core implementation is benchmarked on a shared memory multi-core architecture consisting of two packages of Intel's Cascade Lake Xeon® Gold 6246 CPU, while the GPU implementation is benchmarked on a massively parallel architecture consisting of Nvidia Volta V100 GPUs. We compare the multi-core implementation to the GPU implementation for both the setup and solution stages. To the best of our knowledge, this is the first parallel implementation and demonstration of the versatile MsRSB method on the GPU architecture.


Author(s):  
Seher Acer ◽  
Ariful Azad ◽  
Erik G Boman ◽  
Aydın Buluç ◽  
Karen D. Devine ◽  
...  

Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to implement on parallel systems. With tens of billions of hardware threads and deep memory hierarchies, the exascale computing systems in particular pose extreme challenges in scaling graph algorithms. The codesign center on combinatorial algorithms, ExaGraph, was established to design and develop methods and techniques for efficient implementation of key combinatorial (graph) algorithms chosen from a diverse set of exascale applications. Algebraic and combinatorial methods have a complementary role in the advancement of computational science and engineering, including playing an enabling role on each other. In this paper, we survey the algorithmic and software development activities performed under the auspices of ExaGraph from both a combinatorial and an algebraic perspective. In particular, we detail our recent efforts in porting the algorithms to manycore accelerator (GPU) architectures. We also provide a brief survey of the applications that have benefited from the scalable implementations of different combinatorial algorithms to enable scientific discovery at scale. We believe that several applications will benefit from the algorithmic and software tools developed by the ExaGraph team.


2021 ◽  
Author(s):  
Alexssandro Ferreira Cordeiro ◽  
Pedro Luiz de Paula Filho ◽  
Hamilton Pereira Silva ◽  
Arnaldo Candido Junior ◽  
Edresson Casanova ◽  
...  

Abstract Purpose: analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming methodologies. Material and methods: for image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C# with the IDE Visual studio. Results: the results of the comparisons indicate that the form of sequential programming in a CPU generates reliable images at a high custom of time when compared to the forms of parallel programming in CPU and GPU. While parallel programming generates faster results, but with increased noise in the reconstructed image. For data types float a GPU obtained best result with average time equivalent to 1/3 of the processor, however the data is of type double the parallel CPU approach obtained the best performance. Conclusion: for the float data type, the GPU had the best average time performance, while for the double data type the best average time performance was for the parallel approach CPU. Regarding image quality, the sequential approach obtained similar outputs, while theparallel approaches generated noise in their outputs.


Sign in / Sign up

Export Citation Format

Share Document