scholarly journals Workload Decomposition Strategies for Shared Memory Parallel Systems with OpenMP

2001 ◽  
Vol 9 (2-3) ◽  
pp. 109-122 ◽  
Author(s):  
Beniamino Di Martino ◽  
Sergio Briguglio ◽  
Gregorio Vlad ◽  
Giuliana Fogaccia

A crucial issue in parallel programming (both for distributed and shared memory architectures) is work decomposition. Work decomposition task can be accomplished without large programming effort with use of high-level parallel programming languages, such as OpenMP. Anyway particular care must still be payed on achieving performance goals. In this paper we introduce and compare two decomposition strategies, in the framework of shared memory systems, as applied to a case study particle in cell application. A number of different implementations of them, based on the OpenMP language, are discussed with regard to time efficiency, memory occupancy, and program restructuring effort.

2021 ◽  
Vol 43 (1) ◽  
pp. 1-46
Author(s):  
David Sanan ◽  
Yongwang Zhao ◽  
Shang-Wei Lin ◽  
Liu Yang

To make feasible and scalable the verification of large and complex concurrent systems, it is necessary the use of compositional techniques even at the highest abstraction layers. When focusing on the lowest software abstraction layers, such as the implementation or the machine code, the high level of detail of those layers makes the direct verification of properties very difficult and expensive. It is therefore essential to use techniques allowing to simplify the verification on these layers. One technique to tackle this challenge is top-down verification where by means of simulation properties verified on top layers (representing abstract specifications of a system) are propagated down to the lowest layers (that are an implementation of the top layers). There is no need to say that simulation of concurrent systems implies a greater level of complexity, and having compositional techniques to check simulation between layers is also desirable when seeking for both feasibility and scalability of the refinement verification. In this article, we present CSim 2 a (compositional) rely-guarantee-based framework for the top-down verification of complex concurrent systems in the Isabelle/HOL theorem prover. CSim 2 uses CSimpl, a language with a high degree of expressiveness designed for the specification of concurrent programs. Thanks to its expressibility, CSimpl is able to model many of the features found in real world programming languages like exceptions, assertions, and procedures. CSim 2 provides a framework for the verification of rely-guarantee properties to compositionally reason on CSimpl specifications. Focusing on top-down verification, CSim 2 provides a simulation-based framework for the preservation of CSimpl rely-guarantee properties from specifications to implementations. By using the simulation framework, properties proven on the top layers (abstract specifications) are compositionally propagated down to the lowest layers (source or machine code) in each concurrent component of the system. Finally, we show the usability of CSim 2 by running a case study over two CSimpl specifications of an Arinc-653 communication service. In this case study, we prove a complex property on a specification, and we use CSim 2 to preserve the property on lower abstraction layers.


2003 ◽  
Vol 13 (03) ◽  
pp. 401-412 ◽  
Author(s):  
CLEMENS GRELCK ◽  
SVEN-BODO SCHOLZ

SAC is a purely functional array processing language designed with numerical applications in mind. It supports generic, high-level program specifications in the style of APL. However, rather than providing a fixed set of built-in array operations, SAC provides means to specify such operations in the language itself in a way that still allows their application to arrays of any rank and size. This paper illustrates the major steps in compiling generic, rank- and shape-invariant SAC specifications into efficiently executable multithreaded code for parallel execution on shared memory multiprocessors. The effectiveness of the compilation techniques is demonstrated by means of a small case study on the PDE1 benchmark, which implements 3-dimensional red/black successive over-relaxation. Comparisons with HPF and ZPL show that despite the genericity of code, SAC achieves highly competitive runtime performance characteristics.


1993 ◽  
Vol 2 (4) ◽  
pp. 203-216
Author(s):  
Steve W. Otto

We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes – a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.


2003 ◽  
Vol 11 (2) ◽  
pp. 159-176
Author(s):  
Sergio Briguglio ◽  
Beniamino Di Martino ◽  
Gregorio Vlad

A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC) codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage) and OpenMP (at the intra-node one). The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.


1992 ◽  
Vol 1 (2) ◽  
pp. 177-183 ◽  
Author(s):  
Ashish Deshpande ◽  
Martin Schultz

Linda is a coordination language inverted by David Gelernter at Yale University, which when combined with a computation language (like C) yields a high-level parallel programming language for MIMD machines. Linda is based on a virtual shared associative memory containing objects called tuples. Skeptics have long claimed that Linda programs could not be efficient on distributed memory architectures. In this paper, we address this claim by discussing C-Linda's performance in solving a particular scientific computing problem, the shallow water equations, and make comparisons with alternatives available on various shared and distributed memory parallel machines.


2005 ◽  
Vol 13 (2) ◽  
pp. 127-135 ◽  
Author(s):  
Ami Marowka

The aim of this paper is to present a qualitative evaluation of three state-of-the-art parallel languages: OpenMP, Unified Parallel C (UPC) and Co-Array Fortran (CAF). OpenMP and UPC are explicit parallel programming languages based on the ANSI standard. CAF is an implicit programming language. On the one hand, OpenMP designs for shared-memory architectures and extends the base-language by using compiler directives that annotate the original source-code. On the other hand, UPC and CAF designs for distribute-shared memory architectures and extends the base-language by new parallel constructs. We deconstruct each language into its basic components, show examples, make a detailed analysis, compare them, and finally draw some conclusions.


2008 ◽  
Vol 4 (2) ◽  
pp. 131-146 ◽  
Author(s):  
R. Aversa ◽  
B. Di Martino ◽  
N. Mazzocca ◽  
S. Venticinque

Parallel programming effort can be reduced by using high level constructs such as algorithmic skeletons. Within the MAGDA toolset, supporting programming and execution of mobile agent based distributed applications, we provide a skeleton-based parallel programming environment, based on specialization of Algorithmic Skeleton Java interfaces and classes. Their implementation include mobile agent features for execution on heterogeneous systems, such as clusters of WSs and PCs, and support reliability and dynamic workload balancing. The user can thus develop a parallel, mobile agent based application by simply specialising a given set of classes and methods and using a set of added functionalities.


Sign in / Sign up

Export Citation Format

Share Document