Implementation of unstructured grid GMRES+LU-SGS method on shared-memory, cache-based parallel computers

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text

A Shared Memory Cache Layer across Multiple Executors in Apache Spark

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378179 ◽

2020 ◽

Author(s):

Wei Rang ◽

Donglin Yang ◽

Dazhao Cheng

Keyword(s):

Shared Memory ◽

Apache Spark ◽

Memory Cache

Download Full-text

Teaching tools for parallel processing

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0502219m ◽

2005 ◽

Vol 18 (2) ◽

pp. 219-224

Author(s):

Emina Milovanovic ◽

Natalija Stojanovic

Keyword(s):

Parallel Computing ◽

Parallel Processing ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Cost Effective ◽

Parallel Computers ◽

Free Software ◽

Teaching Tools ◽

Network Of Workstations

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text

A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers

Lecture Notes in Computer Science - Applied Parallel Computing ◽

10.1007/3-540-48051-x_38 ◽

2002 ◽

pp. 380-389 ◽

Cited By ~ 6

Author(s):

Daisuke Takahashi

Keyword(s):

Shared Memory ◽

Parallel Computers ◽

Blocking Algorithm

Download Full-text

Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

Scientific Programming ◽

10.1155/2003/401032 ◽

2003 ◽

Vol 11 (2) ◽

pp. 105-124 ◽

Cited By ~ 12

Author(s):

Vishal Aslot ◽

Rudolf Eigenmann

Keyword(s):

Shared Memory ◽

Parallel Computers ◽

Quantitative Model ◽

Modern Architecture ◽

Easy Access ◽

Multiprocessor Systems ◽

Modern Computer ◽

Multiple Processors ◽

Quantitative Performance ◽

Shared Memory Multiprocessor

The state of modern computer systems has evolved to allow easy access to multiprocessor systems by supporting multiple processors on a single physical package. As the multiprocessor hardware evolves, new ways of programming it are also developed. Some inventions may merely be adopting and standardizing the older paradigms. One such evolving standard for programming shared-memory parallel computers is the OpenMP API. The Standard Performance Evaluation Corporation (SPEC) has created a suite of parallel programs called SPEC OMP to compare and evaluate modern shared-memory multiprocessor systems using the OpenMP standard. We have studied these benchmarks in detail to understand their performance on a modern architecture. In this paper, we present detailed measurements of the benchmarks. We organize, summarize, and display our measurements using a Quantitative Model. We present a detailed discussion and derivation of the model. Also, we discuss the important loops in the SPEC OMPM2001 benchmarks and the reasons for less than ideal speedup on our platform.

Download Full-text