MIMD, Multiple Instruction, Multiple Data

Author(s):  
Wesley Petersen ◽  
Peter Arbenz

The Multiple instruction, multiple data (MIMD) programming model usually refers to computing on distributed memory machines with multiple independent processors. Although processors may run independent instruction streams, we are interested in streams that are always portions of a single program. Between processors which share a coherent memory view (within a node), data access is immediate, whereas between nodes data access is effected by message passing. In this book, we use MPI for such message passing. MPI has emerged as a more/less standard message passing system used on both shared memory and distributed memory machines. It is often the case that although the system consists of multiple independent instruction streams, the programming model is not too different from SIMD. Namely, the totality of a program is logically split into many independent tasks each processed by a group (see Appendix D) of processes—but the overall program is effectively single threaded at the beginning, and likewise at the end. The MIMD model, however, is extremely flexible in that no one process is always master and the other processes slaves. A communicator group of processes performs certain tasks, usually with an arbitrary master/slave relationship. One process may be assigned to be master (or root) and coordinates the tasks of others in the group. We emphasize that the assignments of which is root is arbitrary—any processor may be chosen. Frequently, however, this choice is one of convenience—a file server node, for example. Processors and memory are connected by a network, for example, Figure 5.1. In this form, each processor has its own local memory. This is not always the case: The Cray X1, and NEC SX-6 through SX-8 series machines, have common memory within nodes. Within a node, memory coherency is maintained within local caches. Between nodes, it remains the programmer’s responsibility to assure a proper read–update relationship in the shared data. Data updated by one set of processes should not be clobbered by another set until the data are properly used.

1997 ◽  
Vol 6 (2) ◽  
pp. 201-214 ◽  
Author(s):  
Luis M. Silva ◽  
JoÃo Gabriel Silva ◽  
Simon Chapple

Distributed shared memory has been recognized as an alternative programming model to exploit the parallelism in distributed memory systems because it provides a higher level of abstraction than simple message passing. DSM combines the simple programming model of shared memory with the scalability of distributed memory machines. This article presents DSMPI, a parallel library that runs atop of MPI and provides a DSM abstraction. It provides an easy-to-use programming interface, is fully, portable, and supports heterogeneity. For the sake of flexibility, it supports different coherence protocols and models of consistency. We present some performance results taken in a network of workstations and in a Cray T3D which show that DSMPI can be competitive with MPI for some applications.


1993 ◽  
Vol 2 (4) ◽  
pp. 203-216
Author(s):  
Steve W. Otto

We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes – a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.


2005 ◽  
Vol 18 (2) ◽  
pp. 219-224
Author(s):  
Emina Milovanovic ◽  
Natalija Stojanovic

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.


2006 ◽  
Vol 17 (02) ◽  
pp. 251-270 ◽  
Author(s):  
THOMAS RAUBER ◽  
GUDULA RÜNGER

Multiprocessor task (M-task) programming is a suitable parallel programming model for coding application problems with an inherent modular structure. An M-task can be executed on a group of processors of arbitrary size, concurrently to other M-tasks of the same application program. The data of a multiprocessor task program usually include composed data structures, like vectors or arrays. For distributed memory machines or cluster platforms, those composed data structures are distributed within one or more processor groups. Thus, a concise parallel programming model for M-tasks requires a standardized distributed data format for composed data structures. Additionally, functions for data re-distribution with respect to different data distributions and different processor group layouts are needed to glue program parts together. In this paper, we present a data re-distribution library which extends the M-task programming with Tlib, a library providing operations to split processor groups and to map M-tasks to processor groups.


2001 ◽  
Vol 9 (2-3) ◽  
pp. 163-173 ◽  
Author(s):  
C.S. Ierotheou ◽  
S.P. Johnson ◽  
P.F. Leggett ◽  
M. Cross ◽  
E.W. Evans ◽  
...  

The shared-memory programming model can be an effective way to achieve parallelism on shared memory parallel computers. Historically however, the lack of a programming standard using directives and the limited scalability have affected its take-up. Recent advances in hardware and software technologies have resulted in improvements to both the performance of parallel programs with compiler directives and the issue of portability with the introduction of OpenMP. In this study, the Computer Aided Parallelisation Toolkit has been extended to automatically generate OpenMP-based parallel programs with nominal user assistance. We categorize the different loop types and show how efficient directives can be placed using the toolkit's in-depth interprocedural analysis. Examples are taken from the NAS parallel benchmarks and a number of real-world application codes. This demonstrates the great potential of using the toolkit to quickly parallelise serial programs as well as the good performance achievable on up to 300 processors for hybrid message passing-directive parallelisations.


1994 ◽  
Vol 3 (1) ◽  
pp. 13-32
Author(s):  
A. Averbuch ◽  
E. Gabber ◽  
S. Itzikowitz ◽  
B. Shoham

Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS) and on a distributed memory multiprocessor (a Transputer network). Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP), a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD) multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2681
Author(s):  
Joonmoo Huh ◽  
Deokwoo Lee

Shared memory is the most popular parallel programming model for multi-core processors, while message passing is generally used for large distributed machines. However, as the number of cores on a chip increases, the relative merits of shared memory versus message passing change, and we argue that message passing becomes a viable, high performing, and parallel programming model. To demonstrate this hypothesis, we compare a shared memory architecture with a new message passing architecture on a suite of applications tuned for each system independently. Perhaps surprisingly, the fundamental behaviors of the applications studied in this work, when optimized for both models, are very similar to each other, and both could execute efficiently on multicore architectures despite many implementations being different from each other. Furthermore, if hardware is tuned to support message passing by supporting bulk message transfer and the elimination of unnecessary coherence overheads, and if effective support is available for global operations, then some applications would perform much better on a message passing architecture. Leveraging our insights, we design a message passing architecture that supports both memory-to-memory and cache-to-cache messaging in hardware. With the new architecture, message passing is able to outperform its shared memory counterparts on many of the applications due to the unique advantages of the message passing hardware as compared to cache coherence. In the best case, message passing achieves up to a 34% increase in speed over its shared memory counterpart, and it achieves an average 10% increase in speed. In the worst case, message passing is slowed down in two applications—CG (conjugate gradient) and FT (Fourier transform)—because it could not perform well on the unique data sharing patterns as its counterpart of shared memory. Overall, our analysis demonstrates the importance of considering message passing as a high performing and hardware-supported programming model on future multicore architectures.


Author(s):  
Wesley Petersen ◽  
Peter Arbenz

In the last few years, courses on parallel computation have been developed and offered in many institutions in the UK, Europe and US as a recognition of the growing significance of this topic in mathematics and computer science. There is a clear need for texts that meet the needs of students and lecturers and this book, based on the author's lecture at ETH Zurich is an ideal practical student guide to scientific computing on parallel computers working up from a hardware instruction level, to shared memory machines and finally to distributed memory machines. Aimed at advanced undergraduate and graduate students in applied mathematics, computer science and engineering, subjects covered include linear algebra, fast Fourier transform, and Monte-Carlo simulations, including examples in C and in some cases Fortran. This book is also ideal for practitioners and programmers.


Sign in / Sign up

Export Citation Format

Share Document