Introduction to programming shared-memory and distributed-memory parallel computers

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text

Teaching tools for parallel processing

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0502219m ◽

2005 ◽

Vol 18 (2) ◽

pp. 219-224

Author(s):

Emina Milovanovic ◽

Natalija Stojanovic

Keyword(s):

Parallel Computing ◽

Parallel Processing ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Cost Effective ◽

Parallel Computers ◽

Free Software ◽

Teaching Tools ◽

Network Of Workstations

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text

Improved SSOR and incomplete Cholesky solution of linear equations on shared memory and distributed memory parallel computers

Numerical Linear Algebra with Applications ◽

10.1002/nla.1680010306 ◽

1994 ◽

Vol 1 (3) ◽

pp. 287-311 ◽

Cited By ~ 4

Author(s):

Wayne Joubert ◽

Thomas Oppe

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Linear Equations ◽

Parallel Computers

Download Full-text

Introduction to Parallel Computing

10.1093/oso/9780198515760.001.0001 ◽

2004 ◽

Cited By ~ 3

Author(s):

Wesley Petersen ◽

Peter Arbenz

Keyword(s):

Fourier Transform ◽

Graduate Students ◽

Computer Science ◽

Shared Memory ◽

Distributed Memory ◽

Applied Mathematics ◽

Parallel Computers ◽

Science And Engineering ◽

Distributed Memory Machines ◽

The Uk

In the last few years, courses on parallel computation have been developed and offered in many institutions in the UK, Europe and US as a recognition of the growing significance of this topic in mathematics and computer science. There is a clear need for texts that meet the needs of students and lecturers and this book, based on the author's lecture at ETH Zurich is an ideal practical student guide to scientific computing on parallel computers working up from a hardware instruction level, to shared memory machines and finally to distributed memory machines. Aimed at advanced undergraduate and graduate students in applied mathematics, computer science and engineering, subjects covered include linear algebra, fast Fourier transform, and Monte-Carlo simulations, including examples in C and in some cases Fortran. This book is also ideal for practitioners and programmers.

Download Full-text

Introduction to programming shared-memory and distributed-memory parallel computers

XRDS Crossroads The ACM Magazine for Students ◽

10.1145/1144382.1144384 ◽

2005 ◽

Vol 12 (1) ◽

pp. 2-2 ◽

Cited By ~ 1

Author(s):

Cory Quammen

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Parallel Computers

Download Full-text

Minimal Aggregated Shared Memory Messaging on Distributed Memory Supercomputers

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps.2016.72 ◽

2016 ◽

Author(s):

Benjamin F. Jamroz ◽

John M. Dennis

Keyword(s):

Shared Memory ◽

Distributed Memory

Download Full-text

Studies of Electron-Molecule Collisions on Distributed-memory Parallel Computers

Australian Journal of Physics ◽

10.1071/ph920325 ◽

1992 ◽

Vol 45 (3) ◽

pp. 325 ◽

Cited By ~ 10

Author(s):

Carl Winstead ◽

Qiyan Sun ◽

Paul G Hipes ◽

Marco AP Lima ◽

Vincent McKoy

Keyword(s):

Distributed Memory ◽

Recent Progress ◽

Cost Effective ◽

Parallel Computers ◽

Polyatomic Molecules ◽

Low Energy ◽

Materials Processing ◽

Challenging Problem ◽

Molecular Systems ◽

Critical Data

We review recent progress in the study of low-energy collisions between electrons and polyatomic molecules which has resulted from the application of distributed-memory parallel computing to this challenging problem. Recent studies of electronically elastic and inelastic scattering from several molecular systems, including ethene, propene, cyclopropane, and disilane, are presented. We also discuss the potential of ab initio methods combined with cost-effective parallel computation to provide critical data for the modeling of materials-processing plasmas.

Download Full-text

Parallel programming technologies on computer complexes

Radio Industry (Russia) ◽

10.21778/2413-9599-2020-30-3-28-33 ◽

2020 ◽

Vol 30 (3) ◽

pp. 28-33 ◽

Cited By ~ 1

Author(s):

S. A. Pryadko ◽

A. Yu. Troshin ◽

V. D. Kozlov ◽

A. E. Ivanov

Keyword(s):

Parallel Programming ◽

Shared Memory ◽

Programming Language ◽

Operating Systems ◽

Distributed Memory ◽

Programming Models ◽

Writing Programs ◽

Advantages And Disadvantages ◽

C Programming Language ◽

C Programming

The article describes various options for speeding up calculations on computer systems. These features are closely related to the architecture of these complexes. The objective of this paper is to provide necessary information when selecting the capability for the speeding process of solving the computation problem. The main features implemented using the following models are described: programming in systems with shared memory, programming in systems with distributed memory, and programming on graphics accelerators (video cards). The basic concept, principles, advantages, and disadvantages of each of the considered programming models are described. All standards for writing programs described in the article can be used both on Linux and Windows operating systems. The required libraries are available and compatible with the C/C++ programming language. The article concludes with recommendations on the use of a particular technology, depending on the type of task to be solved.

Download Full-text

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Algorithms ◽

10.3390/a14120342 ◽

2021 ◽

Vol 14 (12) ◽

pp. 342

Author(s):

Alessandro Varsi ◽

Simon Maskell ◽

Paul G. Spirakis

Keyword(s):

Parallel Computing ◽

Shared Memory ◽

Time Complexity ◽

Distributed Memory ◽

Particle Filters ◽

Dynamic Models ◽

State Of The Art ◽

Novel Approach ◽

Non Gaussian ◽

Memory Architectures

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Download Full-text