Effective On-Chip Communication for Message Passing Programs on Multi-Core Processors

Shared memory is the most popular parallel programming model for multi-core processors, while message passing is generally used for large distributed machines. However, as the number of cores on a chip increases, the relative merits of shared memory versus message passing change, and we argue that message passing becomes a viable, high performing, and parallel programming model. To demonstrate this hypothesis, we compare a shared memory architecture with a new message passing architecture on a suite of applications tuned for each system independently. Perhaps surprisingly, the fundamental behaviors of the applications studied in this work, when optimized for both models, are very similar to each other, and both could execute efficiently on multicore architectures despite many implementations being different from each other. Furthermore, if hardware is tuned to support message passing by supporting bulk message transfer and the elimination of unnecessary coherence overheads, and if effective support is available for global operations, then some applications would perform much better on a message passing architecture. Leveraging our insights, we design a message passing architecture that supports both memory-to-memory and cache-to-cache messaging in hardware. With the new architecture, message passing is able to outperform its shared memory counterparts on many of the applications due to the unique advantages of the message passing hardware as compared to cache coherence. In the best case, message passing achieves up to a 34% increase in speed over its shared memory counterpart, and it achieves an average 10% increase in speed. In the worst case, message passing is slowed down in two applications—CG (conjugate gradient) and FT (Fourier transform)—because it could not perform well on the unique data sharing patterns as its counterpart of shared memory. Overall, our analysis demonstrates the importance of considering message passing as a high performing and hardware-supported programming model on future multicore architectures.

Download Full-text

Large-Scale Biomolecular Dynamics Using SMP Clusters

12th International Conference on Nuclear Engineering, Volume 1 ◽

10.1115/icone12-49573 ◽

2004 ◽

Author(s):

Masaaki Suzuki ◽

Hiroshi Okuda ◽

Genki Yagawa

Keyword(s):

Parallel Programming ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Md Simulation ◽

Programming Model ◽

Fast Multipole Method ◽

Long Distance ◽

Parallel Efficiency ◽

Parallel Programming Model

The authors have applied Message Passing Interface (MPI) / OpenMP hybrid parallel programming model to molecular dynamics (MD) method for simulating a protein structure on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directives such as OpenMP for intra-SMP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in case that the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows: Without using FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: - 90% with the hybrid style, - 75% with the flat-MPI style, for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: - 60% with the hybrid style, - 48% with the flat-MPI style, for MD simulation with 117,649 atoms.

Download Full-text

MapReduce Parallel Programming Model: A State-of-the-Art Survey

International Journal of Parallel Programming ◽

10.1007/s10766-015-0395-0 ◽

2015 ◽

Vol 44 (4) ◽

pp. 832-866 ◽

Cited By ~ 24

Author(s):

Ren Li ◽

Haibo Hu ◽

Heng Li ◽

Yunsong Wu ◽

Jianxi Yang

Keyword(s):

Parallel Programming ◽

Programming Model ◽

State Of The Art ◽

Parallel Programming Model

Download Full-text

Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

Microprocessors and Microsystems ◽

10.1016/j.micpro.2016.02.006 ◽

2016 ◽

Vol 43 ◽

pp. 95-103 ◽

Cited By ~ 5

Author(s):

James A. Ross ◽

David A. Richie ◽

Song J. Park ◽

Dale R. Shires

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model ◽

Many Core

Download Full-text

2D-FMFI SAR application on HPC architectures with OmpSs parallel programming model

2012 NASA/ESA Conference on Adaptive Hardware and Systems (AHS) ◽

10.1109/ahs.2012.6268638 ◽

2012 ◽

Author(s):

Fisnik Kraja ◽

Arndt Bode ◽

Xavier Martorell

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model

Download Full-text

Interaction with the User in the SAPFOR System

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2021-24-1-157-183 ◽

2021 ◽

Vol 24 (1) ◽

pp. 157-183

Author(s):

Никита Андреевич Катаев

Keyword(s):

Parallel Programming ◽

Program Transformation ◽

Heterogeneous Computing ◽

Programming Model ◽

Parallel Programs ◽

Parallel Program ◽

Program Parallelization ◽

Parallel Programming Model ◽

The One ◽

High Level

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.

Download Full-text

Actors as a parallel programming model

STACS 91 - Lecture Notes in Computer Science ◽

10.1007/bfb0020798 ◽

2005 ◽

pp. 184-195 ◽

Cited By ~ 5

Author(s):

Françoise Baude ◽

Guy Vidal-Naquet

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model

Download Full-text

A practical parallel programming model

Specification of Parallel Algorithms - DIMACS Series in Discrete Mathematics and Theoretical Computer Science ◽

10.1090/dimacs/018/11 ◽

1994 ◽

pp. 143-160 ◽

Cited By ~ 1

Author(s):

Lawrence Snyder

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model

Download Full-text

Toward An Architecture Independent High Level Parallel Programming Model For Artificial Intelligence

Parallel Processing for Artificial Intelligence - Machine Intelligence and Pattern Recognition ◽

10.1016/b978-0-444-81837-9.50009-9 ◽

1994 ◽

pp. 57-66

Author(s):

Mark S. BERLIN

Keyword(s):

Artificial Intelligence ◽

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model ◽

High Level

Download Full-text

Cluster-Enabled OpenMP: An OpenMP Compiler for the SCASH Software Distributed Shared Memory System

Scientific Programming ◽

10.1155/2001/605217 ◽

2001 ◽

Vol 9 (2-3) ◽

pp. 123-130 ◽

Cited By ~ 21

Author(s):

Mitsuhisa Sato ◽

Hiroshi Harada ◽

Atsushi Hasegawa ◽

Yutaka Ishikawa

Keyword(s):

Shared Memory ◽

Programming Model ◽

Distributed Shared Memory ◽

Memory System ◽

Data Mapping ◽

Loop Scheduling ◽

Parallel Programming Model ◽

Scheduling Method ◽

Shared Memory System ◽

Software Distributed Shared Memory

OpenMP is attracting wide-spread interest because of its easy-to-use parallel programming model for shared memory multiprocessors. We have implemented a "cluster-enabled" OpenMP compiler for a page-based software distributed shared memory system, SCASH, which works on a cluster of PCs. It allows OpenMP programs to run transparently in a distributed memory environment. The compiler transforms OpenMP programs into parallel programs using SCASH so that shared global variables are allocated at run time in the shared address space of SCASH. A set of directives is added to specify data mapping and loop scheduling method which schedules iterations onto threads associated with the data mapping. Our experimental results show that the data mapping may greatly impact on the performance of OpenMP programs in the software distributed shared memory system. The performance of some NAS parallel benchmark programs in OpenMP is improved by using our extended directives.

Download Full-text

The data parallel programming model: A semantic perspective

The Data Parallel Programming Model - Lecture Notes in Computer Science ◽

10.1007/3-540-61736-1_40 ◽

1996 ◽

pp. 4-26 ◽

Cited By ~ 9

Author(s):

Luc Bougé

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Data Parallel ◽

Parallel Programming Model ◽

Data Parallel Programming

Download Full-text