Global Arrays: a portable "shared-memory" programming model for distributed memory computers

We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes – a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.

Download Full-text

Implementation and Performance of DSMPI

Scientific Programming ◽

10.1155/1997/452521 ◽

1997 ◽

Vol 6 (2) ◽

pp. 201-214 ◽

Cited By ~ 2

Author(s):

Luis M. Silva ◽

JoÃo Gabriel Silva ◽

Simon Chapple

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Distributed Shared Memory ◽

Memory Systems ◽

Distributed Memory Machines ◽

Coherence Protocols ◽

And Performance ◽

Performance Results

Distributed shared memory has been recognized as an alternative programming model to exploit the parallelism in distributed memory systems because it provides a higher level of abstraction than simple message passing. DSM combines the simple programming model of shared memory with the scalability of distributed memory machines. This article presents DSMPI, a parallel library that runs atop of MPI and provides a DSM abstraction. It provides an easy-to-use programming interface, is fully, portable, and supports heterogeneity. For the sake of flexibility, it supports different coherence protocols and models of consistency. We present some performance results taken in a network of workstations and in a Cray T3D which show that DSMPI can be competitive with MPI for some applications.

Download Full-text

MIMD, Multiple Instruction, Multiple Data

Introduction to Parallel Computing ◽

10.1093/oso/9780198515760.003.0010 ◽

2004 ◽

Author(s):

Wesley Petersen ◽

Peter Arbenz

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Data Access ◽

File Server ◽

Distributed Memory Machines ◽

Shared Data ◽

Multiple Data ◽

Common Memory

The Multiple instruction, multiple data (MIMD) programming model usually refers to computing on distributed memory machines with multiple independent processors. Although processors may run independent instruction streams, we are interested in streams that are always portions of a single program. Between processors which share a coherent memory view (within a node), data access is immediate, whereas between nodes data access is effected by message passing. In this book, we use MPI for such message passing. MPI has emerged as a more/less standard message passing system used on both shared memory and distributed memory machines. It is often the case that although the system consists of multiple independent instruction streams, the programming model is not too different from SIMD. Namely, the totality of a program is logically split into many independent tasks each processed by a group (see Appendix D) of processes—but the overall program is effectively single threaded at the beginning, and likewise at the end. The MIMD model, however, is extremely flexible in that no one process is always master and the other processes slaves. A communicator group of processes performs certain tasks, usually with an arbitrary master/slave relationship. One process may be assigned to be master (or root) and coordinates the tasks of others in the group. We emphasize that the assignments of which is root is arbitrary—any processor may be chosen. Frequently, however, this choice is one of convenience—a file server node, for example. Processors and memory are connected by a network, for example, Figure 5.1. In this form, each processor has its own local memory. This is not always the case: The Cray X1, and NEC SX-6 through SX-8 series machines, have common memory within nodes. Within a node, memory coherency is maintained within local caches. Between nodes, it remains the programmer’s responsibility to assure a proper read–update relationship in the shared data. Data updated by one set of processes should not be clobbered by another set until the data are properly used.

Download Full-text

Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

19th IEEE International Parallel and Distributed Processing Symposium ◽

10.1109/ipdps.2005.331 ◽

2005 ◽

Cited By ~ 5

Author(s):

Yeliang Zhang ◽

V. Tipparaju ◽

J. Nieplocha ◽

S. Hariri

Keyword(s):

Shared Memory ◽

Conjugate Gradient ◽

Programming Model ◽

Global Arrays

Download Full-text

A study of performance on SMP and distributed memory architectures using a shared memory programming model

Proceedings of the second ACM workshop on Role-based access control - RBAC '97 ◽

10.1145/509593.509637 ◽

1997 ◽

Cited By ~ 2

Author(s):

Eugene D. Brooks ◽

Karen H. Warren

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Programming Model ◽

Memory Architectures

Download Full-text

Practical Wavelet Tree Construction

Journal of Experimental Algorithmics ◽

10.1145/3457197 ◽

2021 ◽

Vol 26 ◽

pp. 1-67

Author(s):

Patrick Dinklage ◽

Jonas Ellert ◽

Johannes Fischer ◽

Florian Kurpicz ◽

Marvin Löbel

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Distributed Memory ◽

Auxiliary Information ◽

Parallel Computers ◽

External Memory ◽

Sequential Algorithms ◽

Bottom Up ◽

Memory Efficiency ◽

Tree Construction

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text

Minimal Aggregated Shared Memory Messaging on Distributed Memory Supercomputers

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps.2016.72 ◽

2016 ◽

Author(s):

Benjamin F. Jamroz ◽

John M. Dennis

Keyword(s):

Shared Memory ◽

Distributed Memory

Download Full-text

Realising a concurrent object-based programming model on parallel virtual shared memory architectures

Programming Models for Massively Parallel Computers ◽

10.1109/pmmpc.1995.504345 ◽

2002 ◽

Cited By ~ 1

Author(s):

M. Fisher ◽

J. Keane

Keyword(s):

Shared Memory ◽

Programming Model ◽

Object Based ◽

Virtual Shared Memory ◽

Memory Architectures ◽

Shared Memory Architectures

Download Full-text

An object-oriented concurrent programming model for simulation applications on distributed memory processors network

1993 Euromicro Workshop on Parallel and Distributed Processing ◽

10.1109/empdp.1993.336417 ◽

2002 ◽

Author(s):

B. Moisan ◽

Y. Duthen ◽

R. Caubet

Keyword(s):

Distributed Memory ◽

Programming Model ◽

Object Oriented ◽

Concurrent Programming

Download Full-text

Parallel programming technologies on computer complexes

Radio Industry (Russia) ◽

10.21778/2413-9599-2020-30-3-28-33 ◽

2020 ◽

Vol 30 (3) ◽

pp. 28-33 ◽

Cited By ~ 1

Author(s):

S. A. Pryadko ◽

A. Yu. Troshin ◽

V. D. Kozlov ◽

A. E. Ivanov

Keyword(s):

Parallel Programming ◽

Shared Memory ◽

Programming Language ◽

Operating Systems ◽

Distributed Memory ◽

Programming Models ◽

Writing Programs ◽

Advantages And Disadvantages ◽

C Programming Language ◽

C Programming

The article describes various options for speeding up calculations on computer systems. These features are closely related to the architecture of these complexes. The objective of this paper is to provide necessary information when selecting the capability for the speeding process of solving the computation problem. The main features implemented using the following models are described: programming in systems with shared memory, programming in systems with distributed memory, and programming on graphics accelerators (video cards). The basic concept, principles, advantages, and disadvantages of each of the considered programming models are described. All standards for writing programs described in the article can be used both on Linux and Windows operating systems. The required libraries are available and compatible with the C/C++ programming language. The article concludes with recommendations on the use of a particular technology, depending on the type of task to be solved.

Download Full-text