Parallelism exploration in sequential algorithms via animation tool

Different high performance techniques, such as profiling, tracing, and instrumentation, have been used to tune and enhance the performance of parallel applications. However, these techniques do not show how to explore the potential of parallelism in a given application. Animating and visualizing the execution process of a sequential algorithm provide a thorough understanding of its usage and functionality. In this work, an interactive web-based educational animation tool was developed to assist users in analyzing sequential algorithms to detect parallel regions regardless of the used parallel programming model. The tool simplifies algorithms’ learning, and helps students to analyze programs efficiently. Our statistical t-test study on a sample of students showed a significant improvement in their perception of the mechanism and parallelism of applications and an increase in their willingness to learn algorithms and parallel programming.

Download Full-text

Study of parallel programming models on computer clusters with Intel MIC coprocessors

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015580864 ◽

2015 ◽

Vol 31 (4) ◽

pp. 303-315 ◽

Cited By ~ 3

Author(s):

Miaoqing Huang ◽

Chenggang Lai ◽

Xuan Shi ◽

Zhijun Hao ◽

Haihang You

Keyword(s):

Parallel Programming ◽

High Performance ◽

Programming Model ◽

Fixed Number ◽

Parallel Applications ◽

Programming Models ◽

Communication Overhead ◽

Computer Clusters ◽

Parallel Programming Models ◽

Intel Mic

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

Download Full-text

Hardware transactional memory: A high performance parallel programming model

Journal of Systems Architecture ◽

10.1016/j.sysarc.2010.06.006 ◽

2010 ◽

Vol 56 (8) ◽

pp. 384-391 ◽

Cited By ~ 2

Author(s):

Chen Fu ◽

Dongxin Wen ◽

Xiaoqun Wang ◽

Xiaozong Yang

Keyword(s):

Parallel Programming ◽

High Performance ◽

Transactional Memory ◽

Programming Model ◽

Hardware Transactional Memory ◽

Parallel Programming Model

Download Full-text

PDDP, A Data Parallel Programming Model

Scientific Programming ◽

10.1155/1996/857815 ◽

1996 ◽

Vol 5 (4) ◽

pp. 319-327

Author(s):

Karen H. Warren

Keyword(s):

Parallel Programming ◽

High Performance ◽

Parallel Machines ◽

Programming Model ◽

Data Distribution ◽

Interprocessor Communication ◽

Data Parallel ◽

Parallel Programming Model ◽

Data Objects ◽

Data Parallel Programming

PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

Download Full-text

Parallel Programming Models and Systems for High Performance Computing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Emerging Research in Cloud Distributed Computing Systems ◽

10.4018/978-1-4666-8213-9.ch008 ◽

2015 ◽

pp. 254-292

Author(s):

Manjunath Gorentla Venkata ◽

Stephen Poole

Keyword(s):

Parallel Programming ◽

High Performance ◽

Programming Model ◽

Parallel System ◽

Programming Models ◽

System Architectures ◽

Parallel Programming Model ◽

Application Developers ◽

Performance Computing

A parallel programming model is an abstraction of a parallel system that allows expression of both algorithms and shared data structures. To accommodate the diversity in parallel system architectures and user requirements, there are a variety of programming models including the models providing a shared memory view or a distributed memory view of the system. The programming models are implemented as libraries, language extensions, or compiler directives. This chapter provides a discussion on programming models and its implementations aimed at application developers, system software researchers, and hardware architects. The first part provides an overview of the programming models. The second part is an in-depth discussion on high-performance networking interface to implement the programming model. The last part of the chapter discusses implementation of a programming model with a case study. Each part of the chapter concludes with a discussion on current research trends and its impact on future architectures.

Download Full-text

MapReduce Parallel Programming Model: A State-of-the-Art Survey

International Journal of Parallel Programming ◽

10.1007/s10766-015-0395-0 ◽

2015 ◽

Vol 44 (4) ◽

pp. 832-866 ◽

Cited By ~ 24

Author(s):

Ren Li ◽

Haibo Hu ◽

Heng Li ◽

Yunsong Wu ◽

Jianxi Yang

Keyword(s):

Parallel Programming ◽

Programming Model ◽

State Of The Art ◽

Parallel Programming Model

Download Full-text

Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

Microprocessors and Microsystems ◽

10.1016/j.micpro.2016.02.006 ◽

2016 ◽

Vol 43 ◽

pp. 95-103 ◽

Cited By ~ 5

Author(s):

James A. Ross ◽

David A. Richie ◽

Song J. Park ◽

Dale R. Shires

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model ◽

Many Core

Download Full-text

2D-FMFI SAR application on HPC architectures with OmpSs parallel programming model

2012 NASA/ESA Conference on Adaptive Hardware and Systems (AHS) ◽

10.1109/ahs.2012.6268638 ◽

2012 ◽

Author(s):

Fisnik Kraja ◽

Arndt Bode ◽

Xavier Martorell

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model

Download Full-text

Interaction with the User in the SAPFOR System

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2021-24-1-157-183 ◽

2021 ◽

Vol 24 (1) ◽

pp. 157-183

Author(s):

Никита Андреевич Катаев

Keyword(s):

Parallel Programming ◽

Program Transformation ◽

Heterogeneous Computing ◽

Programming Model ◽

Parallel Programs ◽

Parallel Program ◽

Program Parallelization ◽

Parallel Programming Model ◽

The One ◽

High Level

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.

Download Full-text

Actors as a parallel programming model

STACS 91 - Lecture Notes in Computer Science ◽

10.1007/bfb0020798 ◽

2005 ◽

pp. 184-195 ◽

Cited By ~ 5

Author(s):

Françoise Baude ◽

Guy Vidal-Naquet

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Parallel Programming Model

Download Full-text

Machine Learning to Design an Auto-tuning System for the Best Compressed Format Detection for Parallel Sparse Computations

Parallel Processing Letters ◽

10.1142/s0129626421500195 ◽

2021 ◽

Author(s):

Olfa Hamdi-Larbi ◽

Ichrak Mehrez ◽

Thomas Dufaud

Keyword(s):

Machine Learning ◽

Numerical Method ◽

High Performance ◽

Programming Model ◽

Learning Algorithm ◽

Sparse Matrix ◽

Sparse Matrices ◽

Matrix Compression ◽

Target Architecture ◽

Parallel Programming Model

Many applications in scientific computing process very large sparse matrices on parallel architectures. The presented work in this paper is a part of a project where our general aim is to develop an auto-tuner system for the selection of the best matrix compression format in the context of high-performance computing. The target smart system can automatically select the best compression format for a given sparse matrix, a numerical method processing this matrix, a parallel programming model and a target architecture. Hence, this paper describes the design and implementation of the proposed concept. We consider a case study consisting of a numerical method reduced to the sparse matrix vector product (SpMV), some compression formats, the data parallel as a programming model and, a distributed multi-core platform as a target architecture. This study allows extracting a set of important novel metrics and parameters which are relative to the considered programming model. Our metrics are used as input to a machine-learning algorithm to predict the best matrix compression format. An experimental study targeting a distributed multi-core platform and processing random and real-world matrices shows that our system can improve in average up to 7% the accuracy of the machine learning.

Download Full-text