On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS) and on a distributed memory multiprocessor (a Transputer network). Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP), a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD) multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.

Download Full-text

Hybrid Message-Passing and Shared-Memory Programming in a Molecular Dynamics Application On Multicore Clusters

The International Journal of High Performance Computing Applications ◽

10.1177/1094342009106188 ◽

2009 ◽

Vol 23 (3) ◽

pp. 196-211 ◽

Cited By ~ 4

Author(s):

Martin J. Chorley ◽

David W. Walker ◽

Martyn F. Guest

Keyword(s):

Molecular Dynamics ◽

Shared Memory ◽

Hybrid Model ◽

Message Passing ◽

Multicore Processors ◽

Parallel Application ◽

Gigabit Ethernet ◽

Code Performance ◽

Programming Techniques ◽

Multicore Clusters

Hybrid programming, whereby shared-memory and message-passing programming techniques are combined within a single parallel application, has often been discussed as a method for increasing code performance on clusters of symmetric multiprocessors (SMPs). This paper examines whether the hybrid model brings any performance benefits for clusters based on multicore processors. A molecular dynamics application has been parallelized using both MPI and hybrid MPI/OpenMP programming models. The performance of this application has been examined on two high-end multicore clusters using both Infiniband and Gigabit Ethernet interconnects. The hybrid model has been found to perform well on the higher-latency Gigabit Ethernet connection, but offers no performance benefit on low-latency Infiniband interconnects. The changes in performance are attributed to the differing communication profiles of the hybrid and MPI codes.

Download Full-text

MIMD, Multiple Instruction, Multiple Data

Introduction to Parallel Computing ◽

10.1093/oso/9780198515760.003.0010 ◽

2004 ◽

Author(s):

Wesley Petersen ◽

Peter Arbenz

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Data Access ◽

File Server ◽

Distributed Memory Machines ◽

Shared Data ◽

Multiple Data ◽

Common Memory

The Multiple instruction, multiple data (MIMD) programming model usually refers to computing on distributed memory machines with multiple independent processors. Although processors may run independent instruction streams, we are interested in streams that are always portions of a single program. Between processors which share a coherent memory view (within a node), data access is immediate, whereas between nodes data access is effected by message passing. In this book, we use MPI for such message passing. MPI has emerged as a more/less standard message passing system used on both shared memory and distributed memory machines. It is often the case that although the system consists of multiple independent instruction streams, the programming model is not too different from SIMD. Namely, the totality of a program is logically split into many independent tasks each processed by a group (see Appendix D) of processes—but the overall program is effectively single threaded at the beginning, and likewise at the end. The MIMD model, however, is extremely flexible in that no one process is always master and the other processes slaves. A communicator group of processes performs certain tasks, usually with an arbitrary master/slave relationship. One process may be assigned to be master (or root) and coordinates the tasks of others in the group. We emphasize that the assignments of which is root is arbitrary—any processor may be chosen. Frequently, however, this choice is one of convenience—a file server node, for example. Processors and memory are connected by a network, for example, Figure 5.1. In this form, each processor has its own local memory. This is not always the case: The Cray X1, and NEC SX-6 through SX-8 series machines, have common memory within nodes. Within a node, memory coherency is maintained within local caches. Between nodes, it remains the programmer’s responsibility to assure a proper read–update relationship in the shared data. Data updated by one set of processes should not be clobbered by another set until the data are properly used.

Download Full-text

On the construction of orthogonal curvilinear grids for regional oceanic modeling: algorithm and user guide

Journal of Oceanological Research ◽

10.29006/1564-2291.jor-2020.48(4).3 ◽

2020 ◽

Vol 48 (4) ◽

pp. 45-111

Author(s):

A. F. Shepetkin

Keyword(s):

Inverse Problem ◽

Numerical Solution ◽

Elliptic Problem ◽

Software Package ◽

Conformal Transformation ◽

Iterative Procedure ◽

Geometric Shape ◽

Curvilinear Grids ◽

Grid Points ◽

Grid Nodes

A new algorithm for constructing orthogonal curvilinear grids on a sphere for a fairly general geometric shape of the modeling region is implemented as a “compile-once - use forever” software package. It is based on the numerical solution of the inverse problem by an iterative procedure -- finding such distribution of grid points along its perimeter, so that the conformal transformation of the perimeter into a rectangle turns this distribution into uniform one. The iterative procedure itself turns out to be multilevel - i.e. an iterative loop built around another, internal iterative procedure. Thereafter, knowing this distribution, the grid nodes inside the region are obtained solving an elliptic problem. It is shown that it was possible to obtain the exact orthogonality of the perimeter at the corners of the grid, to achieve very small, previously unattainable level of orthogonality errors, as well as make it isotropic -- local distances between grid nodes about both directions are equal to each other.

Download Full-text

Some Aspects of Parallel Implementation of the Finite Element Method on Message Passing Architectures

10.21236/ada198731 ◽

1988 ◽

Cited By ~ 2

Author(s):

I. Babuska ◽

H. C. Elman

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Message Passing ◽

Parallel Implementation ◽

The Finite Element Method ◽

Element Method

Download Full-text

Parallel implementation for HSLO(3)-FDTD with message passing interface on Distributed Memory Architecture

2006 International Conference on Computing & Informatics ◽

10.1109/icoci.2006.5276531 ◽

2006 ◽

Author(s):

Mohammad Khatim Hasan ◽

Mohamed Othman ◽

Jalil Md Desa ◽

Zulkifly Abbas ◽

Jumat Sulaiman

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distributed Memory ◽

Parallel Implementation ◽

Memory Architecture ◽

Distributed Memory Architecture

Download Full-text

Parallel Nonnegative Matrix Factorization via Newton Iteration

Parallel Processing Letters ◽

10.1142/s0129626416500146 ◽

2016 ◽

Vol 26 (03) ◽

pp. 1650014 ◽

Cited By ~ 3

Author(s):

Markus Flatz ◽

Marián Vajteršic

Keyword(s):

Shared Memory ◽

Matrix Factorization ◽

Message Passing ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Newton Iteration ◽

Parallel Execution ◽

Kkt Conditions ◽

Nonnegative Matrices ◽

First Order

The goal of Nonnegative Matrix Factorization (NMF) is to represent a large nonnegative matrix in an approximate way as a product of two significantly smaller nonnegative matrices. This paper shows in detail how an NMF algorithm based on Newton iteration can be derived using the general Karush-Kuhn-Tucker (KKT) conditions for first-order optimality. This algorithm is suited for parallel execution on systems with shared memory and also with message passing. Both versions were implemented and tested, delivering satisfactory speedup results.

Download Full-text

An optimized message passing framework for parallel implementation of signal processing applications

2008 Design, Automation and Test in Europe ◽

10.1145/1403375.1403671 ◽

2008 ◽

Cited By ~ 1

Author(s):

Sankalita Saha ◽

Jason Schlessman ◽

Sebastian Puthenpurayil ◽

Shuvra S. Bhattacharyya ◽

Wayne Wolf

Keyword(s):

Signal Processing ◽

Message Passing ◽

Parallel Implementation

Download Full-text

An Efficient Parallel Algorithm for Extreme Eigenvalues of Sparse Nonsymmetric Matrices

The International Journal of Supercomputing Applications ◽

10.1177/109434209200600106 ◽

1992 ◽

Vol 6 (1) ◽

pp. 98-111 ◽

Cited By ~ 2

Author(s):

S. K. Kim ◽

A. T. Chrortopoulos

Keyword(s):

Shared Memory ◽

Message Passing ◽

Sparse Matrices ◽

Data Locality ◽

Main Memory ◽

Global Memory ◽

Global Communication ◽

Step Method ◽

Arnoldi Algorithm ◽

Large Sparse Matrices

Main memory accesses for shared-memory systems or global communications (synchronizations) in message passing systems decrease the computation speed. In this paper, the standard Arnoldi algorithm for approximating a small number of eigenvalues, with largest (or smallest) real parts for nonsymmetric large sparse matrices, is restructured so that only one synchronization point is required; that is, one global communication in a message passing distributed-memory machine or one global memory sweep in a shared-memory machine per each iteration is required. We also introduce an s-step Arnoldi method for finding a few eigenvalues of nonsymmetric large sparse matrices. This method generates reduction matrices that are similar to those generated by the standard method. One iteration of the s-step Arnoldi algorithm corresponds to s iterations of the standard Arnoldi algorithm. The s-step method has improved data locality, minimized global communication, and superior parallel properties. These algorithms are implemented on a 64-node NCUBE/7 Hypercube and a CRAY-2, and performance results are presented.

Download Full-text

Programming shared memory multiprocessors with deterministic message-passing concurrency

2008 Design, Automation and Test in Europe ◽

10.1145/1403375.1403735 ◽

2008 ◽

Cited By ~ 12

Author(s):

Stephen A. Edwards ◽

Nalini Vasudevan ◽

Olivier Tardieu

Keyword(s):

Shared Memory ◽

Message Passing ◽

Shared Memory Multiprocessors

Download Full-text

Teaching tools for parallel processing

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0502219m ◽

2005 ◽

Vol 18 (2) ◽

pp. 219-224

Author(s):

Emina Milovanovic ◽

Natalija Stojanovic

Keyword(s):

Parallel Computing ◽

Parallel Processing ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Cost Effective ◽

Parallel Computers ◽

Free Software ◽

Teaching Tools ◽

Network Of Workstations

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text