Workload Decomposition Strategies for Shared Memory Parallel Systems with OpenMP

A crucial issue in parallel programming (both for distributed and shared memory architectures) is work decomposition. Work decomposition task can be accomplished without large programming effort with use of high-level parallel programming languages, such as OpenMP. Anyway particular care must still be payed on achieving performance goals. In this paper we introduce and compare two decomposition strategies, in the framework of shared memory systems, as applied to a case study particle in cell application. A number of different implementations of them, based on the OpenMP language, are discussed with regard to time efficiency, memory occupancy, and program restructuring effort.

Download Full-text

Reasearch Directions in High-Level Parallel Programming Languages

10.1007/3-540-55160-3 ◽

1992 ◽

Cited By ~ 5

Keyword(s):

Programming Languages ◽

Parallel Programming ◽

Parallel Programming Languages ◽

High Level

Download Full-text

CSim 2

ACM Transactions on Programming Languages and Systems ◽

10.1145/3436808 ◽

2021 ◽

Vol 43 (1) ◽

pp. 1-46

Author(s):

David Sanan ◽

Yongwang Zhao ◽

Shang-Wei Lin ◽

Liu Yang

Keyword(s):

Programming Languages ◽

Concurrent Systems ◽

Theorem Prover ◽

Compositional Techniques ◽

Machine Code ◽

Top Down ◽

Hol Theorem Prover ◽

High Level ◽

High Degree

To make feasible and scalable the verification of large and complex concurrent systems, it is necessary the use of compositional techniques even at the highest abstraction layers. When focusing on the lowest software abstraction layers, such as the implementation or the machine code, the high level of detail of those layers makes the direct verification of properties very difficult and expensive. It is therefore essential to use techniques allowing to simplify the verification on these layers. One technique to tackle this challenge is top-down verification where by means of simulation properties verified on top layers (representing abstract specifications of a system) are propagated down to the lowest layers (that are an implementation of the top layers). There is no need to say that simulation of concurrent systems implies a greater level of complexity, and having compositional techniques to check simulation between layers is also desirable when seeking for both feasibility and scalability of the refinement verification. In this article, we present CSim 2 a (compositional) rely-guarantee-based framework for the top-down verification of complex concurrent systems in the Isabelle/HOL theorem prover. CSim 2 uses CSimpl, a language with a high degree of expressiveness designed for the specification of concurrent programs. Thanks to its expressibility, CSimpl is able to model many of the features found in real world programming languages like exceptions, assertions, and procedures. CSim 2 provides a framework for the verification of rely-guarantee properties to compositionally reason on CSimpl specifications. Focusing on top-down verification, CSim 2 provides a simulation-based framework for the preservation of CSimpl rely-guarantee properties from specifications to implementations. By using the simulation framework, properties proven on the top layers (abstract specifications) are compositionally propagated down to the lowest layers (source or machine code) in each concurrent component of the system. Finally, we show the usability of CSim 2 by running a case study over two CSimpl specifications of an Arinc-653 communication service. In this case study, we prove a complex property on a specification, and we use CSim 2 to preserve the property on lower abstraction layers.

Download Full-text

SAC — FROM HIGH-LEVEL PROGRAMMING WITH ARRAYS TO EFFICIENT PARALLEL EXECUTION

Parallel Processing Letters ◽

10.1142/s0129626403001379 ◽

2003 ◽

Vol 13 (03) ◽

pp. 401-412 ◽

Cited By ~ 15

Author(s):

CLEMENS GRELCK ◽

SVEN-BODO SCHOLZ

Keyword(s):

Shared Memory ◽

Parallel Execution ◽

3 Dimensional ◽

Shape Invariant ◽

Fixed Set ◽

High Level ◽

Successive Over Relaxation ◽

Compilation Techniques ◽

Processing Language

SAC is a purely functional array processing language designed with numerical applications in mind. It supports generic, high-level program specifications in the style of APL. However, rather than providing a fixed set of built-in array operations, SAC provides means to specify such operations in the language itself in a way that still allows their application to arrays of any rank and size. This paper illustrates the major steps in compiling generic, rank- and shape-invariant SAC specifications into efficiently executable multithreaded code for parallel execution on shared memory multiprocessors. The effectiveness of the compilation techniques is demonstrated by means of a small case study on the PDE1 benchmark, which implements 3-dimensional red/black successive over-relaxation. Comparisons with HPF and ZPL show that despite the genericity of code, SAC achieves highly competitive runtime performance characteristics.

Download Full-text

Parallel Array Classes and Lightweight Sharing Mechanisms

Scientific Programming ◽

10.1155/1993/393409 ◽

1993 ◽

Vol 2 (4) ◽

pp. 203-216

Author(s):

Steve W. Otto

Keyword(s):

Finite Element Method ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Memory Usage ◽

Particle In Cell ◽

Parallel Array ◽

Memory Architectures ◽

Shared Memory Architectures

We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes – a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.

Download Full-text

A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

Scientific Programming ◽

10.1155/2003/691573 ◽

2003 ◽

Vol 11 (2) ◽

pp. 159-176

Author(s):

Sergio Briguglio ◽

Beniamino Di Martino ◽

Gregorio Vlad

Keyword(s):

Prediction Model ◽

Performance Prediction ◽

High Performance ◽

Symmetric Multiprocessors ◽

Particle In Cell ◽

Decomposition Strategies ◽

Experimental Values ◽

High Level ◽

Parallelization Efficiency ◽

A Performance

A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC) codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage) and OpenMP (at the intra-node one). The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.

Download Full-text

Efficient Parallel Programming with Linda

Scientific Programming ◽

10.1155/1992/829092 ◽

1992 ◽

Vol 1 (2) ◽

pp. 177-183 ◽

Cited By ~ 1

Author(s):

Ashish Deshpande ◽

Martin Schultz

Keyword(s):

Parallel Programming ◽

Programming Language ◽

Associative Memory ◽

Shallow Water Equations ◽

Distributed Memory ◽

Parallel Machines ◽

Yale University ◽

Coordination Language ◽

High Level ◽

Memory Architectures

Linda is a coordination language inverted by David Gelernter at Yale University, which when combined with a computation language (like C) yields a high-level parallel programming language for MIMD machines. Linda is based on a virtual shared associative memory containing objects called tuples. Skeptics have long claimed that Linda programs could not be efficient on distributed memory architectures. In this paper, we address this claim by discussing C-Linda's performance in solving a particular scientific computing problem, the shallow water equations, and make comparisons with alternatives available on various shared and distributed memory parallel machines.

Download Full-text

Execution Model of Three Parallel Languages: OpenMP, UPC and CAF

Scientific Programming ◽

10.1155/2005/914081 ◽

2005 ◽

Vol 13 (2) ◽

pp. 127-135 ◽

Cited By ~ 5

Author(s):

Ami Marowka

Keyword(s):

Programming Languages ◽

Shared Memory ◽

Qualitative Evaluation ◽

Unified Parallel C ◽

Parallel Languages ◽

Execution Model ◽

Base Language ◽

The One ◽

Memory Architectures ◽

Shared Memory Architectures

The aim of this paper is to present a qualitative evaluation of three state-of-the-art parallel languages: OpenMP, Unified Parallel C (UPC) and Co-Array Fortran (CAF). OpenMP and UPC are explicit parallel programming languages based on the ANSI standard. CAF is an implicit programming language. On the one hand, OpenMP designs for shared-memory architectures and extends the base-language by using compiler directives that annotate the original source-code. On the other hand, UPC and CAF designs for distribute-shared memory architectures and extends the base-language by new parallel constructs. We deconstruct each language into its basic components, show examples, make a detailed analysis, compare them, and finally draw some conclusions.

Download Full-text

A Skeleton Based Programming Paradigm for Mobile Multi-Agents on Distributed Systems and Its Realization within the MAGDA Mobile Agents Platform

Mobile Information Systems ◽

10.1155/2008/745406 ◽

2008 ◽

Vol 4 (2) ◽

pp. 131-146 ◽

Cited By ~ 1

Author(s):

R. Aversa ◽

B. Di Martino ◽

N. Mazzocca ◽

S. Venticinque

Keyword(s):

Parallel Programming ◽

Mobile Agent ◽

Heterogeneous Systems ◽

Distributed Applications ◽

Programming Environment ◽

Agent Based ◽

Workload Balancing ◽

Programming Effort ◽

High Level ◽

Multi Agents

Parallel programming effort can be reduced by using high level constructs such as algorithmic skeletons. Within the MAGDA toolset, supporting programming and execution of mobile agent based distributed applications, we provide a skeleton-based parallel programming environment, based on specialization of Algorithmic Skeleton Java interfaces and classes. Their implementation include mobile agent features for execution on heterogeneous systems, such as clusters of WSs and PCs, and support reliability and dynamic workload balancing. The user can thus develop a parallel, mobile agent based application by simply specialising a given set of classes and methods and using a set of added functionalities.

Download Full-text