Teaching tools for parallel processing

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text

Practical Wavelet Tree Construction

Journal of Experimental Algorithmics ◽

10.1145/3457197 ◽

2021 ◽

Vol 26 ◽

pp. 1-67

Author(s):

Patrick Dinklage ◽

Jonas Ellert ◽

Johannes Fischer ◽

Florian Kurpicz ◽

Marvin Löbel

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Distributed Memory ◽

Auxiliary Information ◽

Parallel Computers ◽

External Memory ◽

Sequential Algorithms ◽

Bottom Up ◽

Memory Efficiency ◽

Tree Construction

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text

Studies of Electron-Molecule Collisions on Distributed-memory Parallel Computers

Australian Journal of Physics ◽

10.1071/ph920325 ◽

1992 ◽

Vol 45 (3) ◽

pp. 325 ◽

Cited By ~ 10

Author(s):

Carl Winstead ◽

Qiyan Sun ◽

Paul G Hipes ◽

Marco AP Lima ◽

Vincent McKoy

Keyword(s):

Distributed Memory ◽

Recent Progress ◽

Cost Effective ◽

Parallel Computers ◽

Polyatomic Molecules ◽

Low Energy ◽

Materials Processing ◽

Challenging Problem ◽

Molecular Systems ◽

Critical Data

We review recent progress in the study of low-energy collisions between electrons and polyatomic molecules which has resulted from the application of distributed-memory parallel computing to this challenging problem. Recent studies of electronically elastic and inelastic scattering from several molecular systems, including ethene, propene, cyclopropane, and disilane, are presented. We also discuss the potential of ab initio methods combined with cost-effective parallel computation to provide critical data for the modeling of materials-processing plasmas.

Download Full-text

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Algorithms ◽

10.3390/a14120342 ◽

2021 ◽

Vol 14 (12) ◽

pp. 342

Author(s):

Alessandro Varsi ◽

Simon Maskell ◽

Paul G. Spirakis

Keyword(s):

Parallel Computing ◽

Shared Memory ◽

Time Complexity ◽

Distributed Memory ◽

Particle Filters ◽

Dynamic Models ◽

State Of The Art ◽

Novel Approach ◽

Non Gaussian ◽

Memory Architectures

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Download Full-text

Parallel processing in power systems computation on a distributed memory message passing multicomputer

10.5353/th_b3124032 ◽

2000 ◽

Author(s):

Chao Hong

Keyword(s):

Parallel Processing ◽

Power Systems ◽

Message Passing ◽

Distributed Memory

Download Full-text

Improved SSOR and incomplete Cholesky solution of linear equations on shared memory and distributed memory parallel computers

Numerical Linear Algebra with Applications ◽

10.1002/nla.1680010306 ◽

1994 ◽

Vol 1 (3) ◽

pp. 287-311 ◽

Cited By ~ 4

Author(s):

Wayne Joubert ◽

Thomas Oppe

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Linear Equations ◽

Parallel Computers

Download Full-text

Parallel Array Classes and Lightweight Sharing Mechanisms

Scientific Programming ◽

10.1155/1993/393409 ◽

1993 ◽

Vol 2 (4) ◽

pp. 203-216

Author(s):

Steve W. Otto

Keyword(s):

Finite Element Method ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Memory Usage ◽

Particle In Cell ◽

Parallel Array ◽

Memory Architectures ◽

Shared Memory Architectures

We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes – a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.

Download Full-text

The Semi-Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

Scientific Programming ◽

10.1155/2001/327048 ◽

2001 ◽

Vol 9 (2-3) ◽

pp. 163-173 ◽

Cited By ~ 4

Author(s):

C.S. Ierotheou ◽

S.P. Johnson ◽

P.F. Leggett ◽

M. Cross ◽

E.W. Evans ◽

...

Keyword(s):

Shared Memory ◽

Message Passing ◽

Programming Model ◽

Parallel Computers ◽

Parallel Programs ◽

Scientific Application ◽

Real World Application ◽

Interprocedural Analysis ◽

Computer Aided ◽

Application Codes

The shared-memory programming model can be an effective way to achieve parallelism on shared memory parallel computers. Historically however, the lack of a programming standard using directives and the limited scalability have affected its take-up. Recent advances in hardware and software technologies have resulted in improvements to both the performance of parallel programs with compiler directives and the issue of portability with the introduction of OpenMP. In this study, the Computer Aided Parallelisation Toolkit has been extended to automatically generate OpenMP-based parallel programs with nominal user assistance. We categorize the different loop types and show how efficient directives can be placed using the toolkit's in-depth interprocedural analysis. Examples are taken from the NAS parallel benchmarks and a number of real-world application codes. This demonstrates the great potential of using the toolkit to quickly parallelise serial programs as well as the good performance achievable on up to 300 processors for hybrid message passing-directive parallelisations.

Download Full-text

Implementation and Performance of DSMPI

Scientific Programming ◽

10.1155/1997/452521 ◽

1997 ◽

Vol 6 (2) ◽

pp. 201-214 ◽

Cited By ~ 2

Author(s):

Luis M. Silva ◽

JoÃo Gabriel Silva ◽

Simon Chapple

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Distributed Shared Memory ◽

Memory Systems ◽

Distributed Memory Machines ◽

Coherence Protocols ◽

And Performance ◽

Performance Results

Distributed shared memory has been recognized as an alternative programming model to exploit the parallelism in distributed memory systems because it provides a higher level of abstraction than simple message passing. DSM combines the simple programming model of shared memory with the scalability of distributed memory machines. This article presents DSMPI, a parallel library that runs atop of MPI and provides a DSM abstraction. It provides an easy-to-use programming interface, is fully, portable, and supports heterogeneity. For the sake of flexibility, it supports different coherence protocols and models of consistency. We present some performance results taken in a network of workstations and in a Cray T3D which show that DSMPI can be competitive with MPI for some applications.

Download Full-text

Introduction to Parallel Computing

10.1093/oso/9780198515760.001.0001 ◽

2004 ◽

Cited By ~ 3

Author(s):

Wesley Petersen ◽

Peter Arbenz

Keyword(s):

Fourier Transform ◽

Graduate Students ◽

Computer Science ◽

Shared Memory ◽

Distributed Memory ◽

Applied Mathematics ◽

Parallel Computers ◽

Science And Engineering ◽

Distributed Memory Machines ◽

The Uk

In the last few years, courses on parallel computation have been developed and offered in many institutions in the UK, Europe and US as a recognition of the growing significance of this topic in mathematics and computer science. There is a clear need for texts that meet the needs of students and lecturers and this book, based on the author's lecture at ETH Zurich is an ideal practical student guide to scientific computing on parallel computers working up from a hardware instruction level, to shared memory machines and finally to distributed memory machines. Aimed at advanced undergraduate and graduate students in applied mathematics, computer science and engineering, subjects covered include linear algebra, fast Fourier transform, and Monte-Carlo simulations, including examples in C and in some cases Fortran. This book is also ideal for practitioners and programmers.

Download Full-text

Parallel simulation

10.1093/oso/9780198803195.003.0007 ◽

2017 ◽

Author(s):

Michael P. Allen ◽

Dominic J. Tildesley

Keyword(s):

Shared Memory ◽

Message Passing ◽

High Performance ◽

Distributed Memory ◽

Nested Loops ◽

Code Domain ◽

Basic Approaches ◽

Effective Use ◽

Memory Architectures ◽

Performance Computing

Parallelization is essential for the effective use of modern high-performance computing facilities. This chapter summarizes some of the basic approaches that are commonly used in molecular simulation programs. The underlying shared-memory and distributed-memory architectures are explained. The concept of program threads and their use in parallelizing nested loops on a shared memory machine is described. Parallel tempering using message passing on a distributed memory machine is discussed and illustrated with an example code. Domain decomposition, and the implementation of constraints on parallel computers, are also explained.

Download Full-text