Transactional Access to Shared Memory in StarSs, a Task Based Programming Model

OpenMP is attracting wide-spread interest because of its easy-to-use parallel programming model for shared memory multiprocessors. We have implemented a "cluster-enabled" OpenMP compiler for a page-based software distributed shared memory system, SCASH, which works on a cluster of PCs. It allows OpenMP programs to run transparently in a distributed memory environment. The compiler transforms OpenMP programs into parallel programs using SCASH so that shared global variables are allocated at run time in the shared address space of SCASH. A set of directives is added to specify data mapping and loop scheduling method which schedules iterations onto threads associated with the data mapping. Our experimental results show that the data mapping may greatly impact on the performance of OpenMP programs in the software distributed shared memory system. The performance of some NAS parallel benchmark programs in OpenMP is improved by using our extended directives.

Download Full-text

Exploring Deterministic Shared Memory Programming Model

2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies ◽

10.1109/pdcat.2012.74 ◽

2012 ◽

Cited By ~ 1

Author(s):

Yu Zhang ◽

Wei Hu

Keyword(s):

Shared Memory ◽

Programming Model

Download Full-text

Parallelization of Pairwise Alignment and Neighbor-Joining Algorithm in Progressive Multiple Sequence Alignment

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v9.i1.pp234-242 ◽

2018 ◽

Vol 9 (1) ◽

pp. 234

Author(s):

Agung Widyo Utomo

Keyword(s):

Shared Memory ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Programming Model ◽

Heuristic Method ◽

Pairwise Alignment ◽

Neighbor Joining ◽

Progressive Alignment ◽

Multiple Sequence ◽

Progressive Multiple Sequence Alignment

Progressive multiple sequence alignment ClustalW is a widely used heuristic method for computing multiple sequence alignment (MSA). It has three stages: distance matrix computation using pairwise alignment, guide tree reconstruction using neighbor-joining and progressive alignment. To accelerate computing for large data, the progressive MSA algorithm needs to be parallelized. This research aims to identify, decompose and implement the pairwise alignment and neighbor-joining in progressive MSA using message passing, shared memory and hybrid programming model in the computer cluster. The experimental results obtained shared memory programming model as the best scenario implementation with speed up up to 12 times.

Download Full-text

Programming Paradigms in High Performance Computing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Research and Applications in Global Supercomputing ◽

10.4018/978-1-4666-7461-5.ch013 ◽

2015 ◽

pp. 303-330

Author(s):

Venkat N Gudivada ◽

Jagadeesh Nandigam ◽

Jordan Paris

Keyword(s):

Shared Memory ◽

High Performance ◽

Programming Model ◽

Commodity Prices ◽

Massive Datasets ◽

Programming Techniques ◽

Performance Programming ◽

High Performance Computers ◽

Programming Paradigms ◽

Performance Computing

Availability of multiprocessor and multi-core chips and GPU accelerators at commodity prices is making personal supercomputers a reality. High performance programming models help apply this computational power to analyze and visualize massive datasets. Problems which required multi-million dollar supercomputers until recently can now be solved using personal supercomputers. However, specialized programming techniques are needed to harness the power of supercomputers. This chapter provides an overview of approaches to programming High Performance Computers (HPC). The programming paradigms illustrated include OpenMP, OpenACC, CUDA, OpenCL, shared-memory based concurrent programming model of Haskell, MPI, MapReduce, and message-based distributed computing model of Erlang. The goal is to provide enough detail on various paradigms to help the reader understand the fundamental differences and similarities among the paradigms. Example programs are chosen to illustrate the salient concepts that define these paradigms. The chapter concludes by providing research directions and future trends in programming high performance computers.

Download Full-text

Parallel Array Classes and Lightweight Sharing Mechanisms

Scientific Programming ◽

10.1155/1993/393409 ◽

1993 ◽

Vol 2 (4) ◽

pp. 203-216

Author(s):

Steve W. Otto

Keyword(s):

Finite Element Method ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Memory Usage ◽

Particle In Cell ◽

Parallel Array ◽

Memory Architectures ◽

Shared Memory Architectures

We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes – a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.

Download Full-text

Global Arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of Supercomputing '94 ◽

10.1109/superc.1994.344297 ◽

2002 ◽

Cited By ~ 29

Author(s):

J. Nieplocha ◽

R.J. Harrison ◽

R.J. Littlefield

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Programming Model ◽

Global Arrays

Download Full-text

The Semi-Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

Scientific Programming ◽

10.1155/2001/327048 ◽

2001 ◽

Vol 9 (2-3) ◽

pp. 163-173 ◽

Cited By ~ 4

Author(s):

C.S. Ierotheou ◽

S.P. Johnson ◽

P.F. Leggett ◽

M. Cross ◽

E.W. Evans ◽

...

Keyword(s):

Shared Memory ◽

Message Passing ◽

Programming Model ◽

Parallel Computers ◽

Parallel Programs ◽

Scientific Application ◽

Real World Application ◽

Interprocedural Analysis ◽

Computer Aided ◽

Application Codes

The shared-memory programming model can be an effective way to achieve parallelism on shared memory parallel computers. Historically however, the lack of a programming standard using directives and the limited scalability have affected its take-up. Recent advances in hardware and software technologies have resulted in improvements to both the performance of parallel programs with compiler directives and the issue of portability with the introduction of OpenMP. In this study, the Computer Aided Parallelisation Toolkit has been extended to automatically generate OpenMP-based parallel programs with nominal user assistance. We categorize the different loop types and show how efficient directives can be placed using the toolkit's in-depth interprocedural analysis. Examples are taken from the NAS parallel benchmarks and a number of real-world application codes. This demonstrates the great potential of using the toolkit to quickly parallelise serial programs as well as the good performance achievable on up to 300 processors for hybrid message passing-directive parallelisations.

Download Full-text

Implementation and Performance of DSMPI

Scientific Programming ◽

10.1155/1997/452521 ◽

1997 ◽

Vol 6 (2) ◽

pp. 201-214 ◽

Cited By ~ 2

Author(s):

Luis M. Silva ◽

JoÃo Gabriel Silva ◽

Simon Chapple

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Distributed Shared Memory ◽

Memory Systems ◽

Distributed Memory Machines ◽

Coherence Protocols ◽

And Performance ◽

Performance Results

Distributed shared memory has been recognized as an alternative programming model to exploit the parallelism in distributed memory systems because it provides a higher level of abstraction than simple message passing. DSM combines the simple programming model of shared memory with the scalability of distributed memory machines. This article presents DSMPI, a parallel library that runs atop of MPI and provides a DSM abstraction. It provides an easy-to-use programming interface, is fully, portable, and supports heterogeneity. For the sake of flexibility, it supports different coherence protocols and models of consistency. We present some performance results taken in a network of workstations and in a Cray T3D which show that DSMPI can be competitive with MPI for some applications.

Download Full-text

Effective On-Chip Communication for Message Passing Programs on Multi-Core Processors

Electronics ◽

10.3390/electronics10212681 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2681

Author(s):

Joonmoo Huh ◽

Deokwoo Lee

Keyword(s):

Parallel Programming ◽

Shared Memory ◽

Message Passing ◽

Programming Model ◽

Multicore Architectures ◽

Worst Case ◽

High Performing ◽

Parallel Programming Model ◽

On Chip ◽

Sharing Patterns

Shared memory is the most popular parallel programming model for multi-core processors, while message passing is generally used for large distributed machines. However, as the number of cores on a chip increases, the relative merits of shared memory versus message passing change, and we argue that message passing becomes a viable, high performing, and parallel programming model. To demonstrate this hypothesis, we compare a shared memory architecture with a new message passing architecture on a suite of applications tuned for each system independently. Perhaps surprisingly, the fundamental behaviors of the applications studied in this work, when optimized for both models, are very similar to each other, and both could execute efficiently on multicore architectures despite many implementations being different from each other. Furthermore, if hardware is tuned to support message passing by supporting bulk message transfer and the elimination of unnecessary coherence overheads, and if effective support is available for global operations, then some applications would perform much better on a message passing architecture. Leveraging our insights, we design a message passing architecture that supports both memory-to-memory and cache-to-cache messaging in hardware. With the new architecture, message passing is able to outperform its shared memory counterparts on many of the applications due to the unique advantages of the message passing hardware as compared to cache coherence. In the best case, message passing achieves up to a 34% increase in speed over its shared memory counterpart, and it achieves an average 10% increase in speed. In the worst case, message passing is slowed down in two applications—CG (conjugate gradient) and FT (Fourier transform)—because it could not perform well on the unique data sharing patterns as its counterpart of shared memory. Overall, our analysis demonstrates the importance of considering message passing as a high performing and hardware-supported programming model on future multicore architectures.

Download Full-text