QSM: A general purpose shared-memory model for parallel computation

It is shown that any program written for the idealized shared-memory model of parallel computation can be simulated on a hypercube architecture with only constant factor inefficiency, provided that the original program has a certain amount of parallel slackness.

Download Full-text

Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures - SPAA '97 ◽

10.1145/258492.258500 ◽

1997 ◽

Cited By ~ 28

Author(s):

Phillip B. Gibbons ◽

Yossi Matias ◽

Vijaya Ramachandran

Keyword(s):

Parallel Computation ◽

Shared Memory ◽

Memory Model ◽

Bridging Model

Download Full-text

Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?

Theory of Computing Systems ◽

10.1007/s002240000121 ◽

1999 ◽

Vol 32 (3) ◽

pp. 327-359 ◽

Cited By ~ 11

Author(s):

P. B. Gibbons ◽

Y. Matias

Keyword(s):

Parallel Computation ◽

Shared Memory ◽

Memory Model ◽

Bridging Model

Download Full-text

Experiences of the GPU Thread Configuration and Shared Memory

European Journal of Engineering Research and Science ◽

10.24018/ejers.2018.3.7.788 ◽

2018 ◽

Vol 3 (7) ◽

pp. 12

Author(s):

DaeHwan Kim

Keyword(s):

Parallel Computation ◽

Shared Memory ◽

Memory Performance ◽

General Purpose ◽

Global Memory ◽

Experimental Result ◽

Gpu Programming ◽

Memory Accesses

Nowadays, GPU processors are widely used for general-purpose parallel computation applications. In the GPU programming, thread and block configuration is one of the most important decisions to be made, which increases parallelism and hides instruction latency. However, in many cases, it is often difficult to have sufficient parallelism to hide all the latencies, where the high latencies are often caused by the global memory accesses. In order to reduce the number of those accesses, the shared memory is instead used which is much faster than the global memory being located on a chip. The performance of the proposed thread configuration is evaluated on the GPU 960 processor. The experimental result shows that the best configuration improves the performance by 7.3 times compared to the worst configuration in the experiment. The experiences are also discussed for the shared memory performance when compared to that of the global memory.

Download Full-text

PPT-SASMM: Scalable Analytical Shared Memory Model

The International Symposium on Memory Systems ◽

10.1145/3422575.3422806 ◽

2020 ◽

Author(s):

Atanu Barai ◽

Gopinath Chennupati ◽

Nandakishore Santhi ◽

Abdel-Hameed Badawy ◽

Yehia Arafa ◽

...

Keyword(s):

Shared Memory ◽

Memory Model

Download Full-text

A GPGPU PROGRAMMING FRAMEWORK BASED ON A SHARED-MEMORY MODEL

Parallel and Distributed Computing and Networks ◽

10.2316/journal.211.2013.3.211-1053 ◽

2013 ◽

Vol 3 (3) ◽

Cited By ~ 5

Author(s):

Kazuhiko Ohno ◽

Dai Michiura ◽

Masaki Matsumoto ◽

Takahiro Sasaki ◽

Toshio Kondo

Keyword(s):

Shared Memory ◽

Memory Model ◽

Programming Framework

Download Full-text

Streams: Emerging from a Shared Memory Model

OpenMP in a New Era of Parallelism - Lecture Notes in Computer Science ◽

10.1007/978-3-540-79561-2_12 ◽

2008 ◽

pp. 134-145 ◽

Cited By ~ 1

Author(s):

Benedict R. Gaster

Keyword(s):

Shared Memory ◽

Memory Model

Download Full-text

A GPGPU Programming Framework based on a Shared-Memory Model

Parallel and Distributed Computing and Systems ◽

10.2316/p.2011.757-097 ◽

2011 ◽

Author(s):

Kazuhiko Ohno ◽

Dai Michiura ◽

Masaki Matsumoto ◽

Takahiro Sasaki ◽

Toshio Kondo

Keyword(s):

Shared Memory ◽

Memory Model ◽

Programming Framework

Download Full-text

PARALLELIZATION OF ASSEMBLY OPERATION IN FINITE ELEMENT METHOD

Acta Polytechnica ◽

10.14311/ap.2020.60.0025 ◽

2020 ◽

Vol 60 (1) ◽

pp. 25-37

Author(s):

Michal Bošanský ◽

Bořek Patzák

Keyword(s):

Finite Element ◽

Shared Memory ◽

Object Oriented ◽

Memory Model ◽

Finite Element Code ◽

Large Problem ◽

Assembly Operation ◽

Finite Element Software ◽

Multiple Threads ◽

Assembly Operations

The efficient codes can take an advantage of multiple threads and/or processing nodes to partition a work that can be processed concurrently. This can reduce the overall run-time or make the solution of a large problem feasible. This paper deals with evaluation of different parallelization strategies of assembly operations for global vectors and matrices, which are one of the critical operations in any finite element software. Different assembly strategies for systems with a shared memory model are proposed and evaluated, using Open Multi-Processing (OpenMP), Portable Operating System Interface (POSIX), and C++11 Threads. The considered strategies are based on simple synchronization directives, various block locking algorithms and, finally, on smart locking free processing based on a colouring algorithm. The different strategies were implemented in a free finite element code with object-oriented architecture OOFEM [1].

Download Full-text