A GPGPU Programming Framework based on a Shared-Memory Model

Author(s):  
Kazuhiko Ohno ◽  
Dai Michiura ◽  
Masaki Matsumoto ◽  
Takahiro Sasaki ◽  
Toshio Kondo
Author(s):  
Kazuhiko Ohno ◽  
Dai Michiura ◽  
Masaki Matsumoto ◽  
Takahiro Sasaki ◽  
Toshio Kondo

Author(s):  
Kazuhiko Ohno ◽  
Dai Michiura ◽  
Masaki Matsumoto ◽  
Takahiro Sasaki ◽  
Toshio Kondo

Author(s):  
Atanu Barai ◽  
Gopinath Chennupati ◽  
Nandakishore Santhi ◽  
Abdel-Hameed Badawy ◽  
Yehia Arafa ◽  
...  
Keyword(s):  

2020 ◽  
Vol 60 (1) ◽  
pp. 25-37
Author(s):  
Michal Bošanský ◽  
Bořek Patzák

The efficient codes can take an advantage of multiple threads and/or processing nodes to partition a work that can be processed concurrently. This can reduce the overall run-time or make the solution of a large problem feasible. This paper deals with evaluation of different parallelization strategies of assembly operations for global vectors and matrices, which are one of the critical operations in any finite element software. Different assembly strategies for systems with a shared memory model are proposed and evaluated, using Open Multi-Processing (OpenMP), Portable Operating System Interface (POSIX), and C++11 Threads. The considered strategies are based on simple synchronization directives, various block locking algorithms and, finally, on smart locking free processing based on a colouring algorithm. The different strategies were implemented in a free finite element code with object-oriented architecture OOFEM [1].


2001 ◽  
Vol 11 (01) ◽  
pp. 65-76
Author(s):  
LUCIANA ARANTES ◽  
DENIS POITRENAUD ◽  
PIERRE SENS ◽  
BERTIL FOLLIOT

In this article, we introduce a new logical clock, the barrier-lock clock, whose conception is based on the lazy release consistency memory model (LRC) supported by several distributed shared memory (DSM) systems. Since in the LRC, the propagation of shared memory updates performed by the processes of a parallel application is induced by lock and barrier operations, our logical clock has been modeled on those operations. Each barrier-lock times-tamp encodes the synchronization operation with which it is associated. Its size is not dependent on the number of processes of the system, as the traditional logical vector clocks, but it is proportional to the number of locks. The barrier-lock time characterizes the causality of shared memory updates performed by processes of a parallel application running on a LRC-based DSM system. A formal proof and experimental tests have confirmed such property.


Sign in / Sign up

Export Citation Format

Share Document