A GPGPU Programming Framework based on a Shared-Memory Model

A GPGPU PROGRAMMING FRAMEWORK BASED ON A SHARED-MEMORY MODEL

Parallel and Distributed Computing and Networks ◽

10.2316/journal.211.2013.3.211-1053 ◽

2013 ◽

Vol 3 (3) ◽

Cited By ~ 5

Author(s):

Kazuhiko Ohno ◽

Dai Michiura ◽

Masaki Matsumoto ◽

Takahiro Sasaki ◽

Toshio Kondo

Keyword(s):

Shared Memory ◽

Memory Model ◽

Programming Framework

Download Full-text

A GPGPU Programming Framework based on a Shared-Memory Model

Parallel and Distributed Computing and Systems ◽

10.2316/p.2012.757-097 ◽

2012 ◽

Author(s):

Kazuhiko Ohno ◽

Dai Michiura ◽

Masaki Matsumoto ◽

Takahiro Sasaki ◽

Toshio Kondo

Keyword(s):

Shared Memory ◽

Memory Model ◽

Programming Framework

Download Full-text

PPT-SASMM: Scalable Analytical Shared Memory Model

The International Symposium on Memory Systems ◽

10.1145/3422575.3422806 ◽

2020 ◽

Author(s):

Atanu Barai ◽

Gopinath Chennupati ◽

Nandakishore Santhi ◽

Abdel-Hameed Badawy ◽

Yehia Arafa ◽

...

Keyword(s):

Shared Memory ◽

Memory Model

Download Full-text

Streams: Emerging from a Shared Memory Model

OpenMP in a New Era of Parallelism - Lecture Notes in Computer Science ◽

10.1007/978-3-540-79561-2_12 ◽

2008 ◽

pp. 134-145 ◽

Cited By ~ 1

Author(s):

Benedict R. Gaster

Keyword(s):

Shared Memory ◽

Memory Model

Download Full-text

An integer programming framework for optimizing shared memory use on GPUs

2010 International Conference on High Performance Computing ◽

10.1109/hipc.2010.5713187 ◽

2010 ◽

Cited By ~ 4

Author(s):

Wenjing Ma ◽

Gagan Agrawal

Keyword(s):

Integer Programming ◽

Shared Memory ◽

Programming Framework

Download Full-text

PARALLELIZATION OF ASSEMBLY OPERATION IN FINITE ELEMENT METHOD

Acta Polytechnica ◽

10.14311/ap.2020.60.0025 ◽

2020 ◽

Vol 60 (1) ◽

pp. 25-37

Author(s):

Michal Bošanský ◽

Bořek Patzák

Keyword(s):

Finite Element ◽

Shared Memory ◽

Object Oriented ◽

Memory Model ◽

Finite Element Code ◽

Large Problem ◽

Assembly Operation ◽

Finite Element Software ◽

Multiple Threads ◽

Assembly Operations

The efficient codes can take an advantage of multiple threads and/or processing nodes to partition a work that can be processed concurrently. This can reduce the overall run-time or make the solution of a large problem feasible. This paper deals with evaluation of different parallelization strategies of assembly operations for global vectors and matrices, which are one of the critical operations in any finite element software. Different assembly strategies for systems with a shared memory model are proposed and evaluated, using Open Multi-Processing (OpenMP), Portable Operating System Interface (POSIX), and C++11 Threads. The considered strategies are based on simple synchronization directives, various block locking algorithms and, finally, on smart locking free processing based on a colouring algorithm. The different strategies were implemented in a free finite element code with object-oriented architecture OOFEM [1].

Download Full-text

Experimental evaluation of QSM, a simple shared-memory model

Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999 ◽

10.1109/ipps.1999.760447 ◽

2003 ◽

Cited By ~ 4

Author(s):

B. Grayson ◽

M. Dahlin ◽

V. Ramachandran

Keyword(s):

Shared Memory ◽

Experimental Evaluation ◽

Memory Model

Download Full-text

Transformations of Mutual Exclusion Algorithms from the Cache-Coherent Model to the Distributed Shared Memory Model

25th IEEE International Conference on Distributed Computing Systems (ICDCS'05) ◽

10.1109/icdcs.2005.83 ◽

2005 ◽

Author(s):

Hyonho Lee

Keyword(s):

Shared Memory ◽

Mutual Exclusion ◽

Distributed Shared Memory ◽

Memory Model

Download Full-text

THE BARRIER-LOCK CLOCK: A SCALABLE SYNCHRONIZATION-ORIENTED LOGICAL CLOCK

Parallel Processing Letters ◽

10.1142/s0129626401000439 ◽

2001 ◽

Vol 11 (01) ◽

pp. 65-76

Author(s):

LUCIANA ARANTES ◽

DENIS POITRENAUD ◽

PIERRE SENS ◽

BERTIL FOLLIOT

Keyword(s):

Shared Memory ◽

Distributed Shared Memory ◽

Experimental Tests ◽

Formal Proof ◽

Memory Model ◽

Parallel Application ◽

System A ◽

Release Consistency ◽

Vector Clocks

In this article, we introduce a new logical clock, the barrier-lock clock, whose conception is based on the lazy release consistency memory model (LRC) supported by several distributed shared memory (DSM) systems. Since in the LRC, the propagation of shared memory updates performed by the processes of a parallel application is induced by lock and barrier operations, our logical clock has been modeled on those operations. Each barrier-lock times-tamp encodes the synchronization operation with which it is associated. Its size is not dependent on the number of processes of the system, as the traditional logical vector clocks, but it is proportional to the number of locks. The barrier-lock time characterizes the causality of shared memory updates performed by processes of a parallel application running on a LRC-based DSM system. A formal proof and experimental tests have confirmed such property.

Download Full-text