EXTENDING OPENMP FOR TASK PARALLELISM

In a wide variety of scientific parallel applications, both task and data parallelism must be exploited to achieve the best possible performance on a multiprocessor machine. These applications induce task-graph parallelism with coarse-grain granularity. Nevertheless, using the available task-graph parallelism and combining it with data parallelism can increase the performance of parallel applications considerably since an additional degree of parallelism is exploited. The OpenMP standard supports data parallelism but does not support task-graph parallelism. In this paper we present an integration of task-graph parallelism in OpenMP by extending the parallel sections constructs to include task-index and precedence-relations matrix clauses. There are many ways in which task-graph parallelism can be supported in a programming environment. A fundamental design decision is whether the programmer has to write programs with explicit precedence relations, or if the responsibility of precedence relations generation is delegated to the compiler. One of the benefits provided by parallel programming models like OpenMP is that they liberate the programmer from dealing with the underlying details of communication and synchronization, which are cumbersome and error-prone tasks. If task-graph parallelism is to find acceptance, writing task-graph parallel programs must be no harder than writing data parallel programs, and therefore, in our design, precedence relations are described through simple programmer annotations, with implementation details handled by the system. This paper concludes with a description of several parallel application kernels that were developed to study the practical aspects of task-graph parallelism in OpenMP. The examples demonstrate that exploiting data and task parallelism in a single framework is the key to achieving good performance in a variety of applications.

Download Full-text

Code Generation from Simulink Models with Task and Data Parallelism

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v21i.9004 ◽

2021 ◽

Vol 21 ◽

pp. 1-13

Author(s):

Pin Xu ◽

Masato Edahiro ◽

Kondo Masaki

Keyword(s):

Hierarchical Clustering ◽

Code Generation ◽

Heterogeneous Computing ◽

Parallel Programs ◽

Data Parallelism ◽

Task Parallelism ◽

Clustering Method ◽

Computing Environment ◽

Sequential Programs ◽

Data Parallel

In this paper, we propose a method to automatically generate parallelized code from Simulink models, while exploiting both task and data parallelism. Building on previous research, we propose a model-based parallelizer (MBP) that exploits task parallelism and assigns tasks to CPU cores using a hierarchical clustering method. We also propose amethod in which data-parallel SYCL code is generated from Simulink models; computations with data parallelism are expressed in the form of S-Function Builder blocks and are executed in a heterogeneous computing environment. Most parts of the procedure can be automated with scripts, and the two methods can be applied together. In the evaluation, the data-parallel programs generated using our proposed method achieved a maximum speedup of approximately 547 times, compared to sequential programs, without observable differences in the computed results. In addition, the programs generated while exploiting both task and data parallelism were confirmed to have achieved better performance than those exploiting either one of the two.

Download Full-text

Machine and Collection Abstractions for User-Implemented Data-Parallel Programming

Scientific Programming ◽

10.1155/2000/485607 ◽

2000 ◽

Vol 8 (4) ◽

pp. 231-246 ◽

Cited By ~ 1

Author(s):

Magne Haveraaen

Keyword(s):

Data Structures ◽

Commercial Application ◽

Data Parallelism ◽

Task Parallelism ◽

Data Parallel ◽

Full Control ◽

High Level ◽

Intensive Programs ◽

Parallel Distribution ◽

Sequential Implementation

Data parallelism has appeared as a fruitful approach to the parallelisation of compute-intensive programs. Data parallelism has the advantage of mimicking the sequential (and deterministic) structure of programs as opposed to task parallelism, where the explicit interaction of processes has to be programmed. In data parallelism data structures, typically collection classes in the form of large arrays, are distributed on the processors of the target parallel machine. Trying to extract distribution aspects from conventional code often runs into problems with a lack of uniformity in the use of the data structures and in the expression of data dependency patterns within the code. Here we propose a framework with two conceptual classes, Machine and Collection. The Machine class abstracts hardware communication and distribution properties. This gives a programmer high-level access to the important parts of the low-level architecture. The Machine class may readily be used in the implementation of a Collection class, giving the programmer full control of the parallel distribution of data, as well as allowing normal sequential implementation of this class. Any program using such a collection class will be parallelisable, without requiring any modification, by choosing between sequential and parallel versions at link time. Experiments with a commercial application, built using the Sophus library which uses this approach to parallelisation, show good parallel speed-ups, without any adaptation of the application program being needed.

Download Full-text

VISPAR: A VISUAL TOOL FOR DESIGNING PARALLEL PROGRAMS

International Journal of Computing ◽

10.47839/ijc.1.1.76 ◽

2014 ◽

pp. 64-70

Author(s):

Sergei Gorlatch ◽

Henry Kehbel

Keyword(s):

Performance Prediction ◽

Parallel Programs ◽

Parallel Applications ◽

Current Status ◽

Data Parallelism ◽

Cost Models ◽

The Novel ◽

Visual Tool ◽

Systematic Program

We describe VisPar - a new visual tool intended to support the programmer in the process of designing complex parallel applications. The novel features of the tool are as follows: support of both task and data parallelism and mixture thereof, use of analytical cost models for performance prediction, systematic program design by optimizing transformations, and visualization of the design process. We demonstrate the usage of VisPar on a relevant case study - the practically used Jpeg compression algorithm and report on the current status of the tool implementation.

Download Full-text

Resource-Aware Device Allocation of Data-Parallel Applications on Heterogeneous Systems

Electronics ◽

10.3390/electronics9111825 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1825

Author(s):

Donghyeon Kim ◽

Seokwon Kang ◽

Junsu Lim ◽

Sunwook Jung ◽

Woosung Kim ◽

...

Keyword(s):

Heterogeneous Systems ◽

Parallel Applications ◽

Small Data ◽

Data Set ◽

Computing Device ◽

Parallel Application ◽

Multiple Gpus ◽

Data Parallel ◽

Multiple Data ◽

Resource Aware

As recent heterogeneous systems comprise multi-core CPUs and multiple GPUs, efficient allocation of multiple data-parallel applications has become a primary goal to achieve both maximum total performance and efficiency. However, the efficient orchestration of multiple applications is highly challenging because a detailed runtime status such as expected remaining time and available memory size of each computing device is hidden. To solve these problems, we propose a dynamic data-parallel application allocation framework called ADAMS. Evaluations show that our framework improves the average total execution device time by 1.85× over the round-robin policy in the non-shared-memory system with small data set.

Download Full-text

Mixed Task and Data Parallel Executions in General Linear Methods

Scientific Programming ◽

10.1155/2007/683198 ◽

2007 ◽

Vol 15 (3) ◽

pp. 137-155 ◽

Cited By ~ 2

Author(s):

Thomas Rauber ◽

Gudula Rünger

Keyword(s):

Differential Equations ◽

Ordinary Differential Equations ◽

General Linear Methods ◽

Parallel Applications ◽

General Linear ◽

Data Parallelism ◽

Good Efficiency ◽

Data Parallel ◽

Fine Grain ◽

Actual Effectiveness

On many parallel target platforms it can be advantageous to implement parallel applications as a collection of multiprocessor tasks that are concurrently executed and are internally implemented with fine-grain SPMD parallelism. A class of applications which can benefit from this programming style are methods for solving systems of ordinary differential equations. Many recent solvers have been designed with an additional potential of method parallelism, but the actual effectiveness of mixed task and data parallelism depends on the specific communication and computation requirements imposed by the equation to be solved. In this paper we study mixed task and data parallel implementations for general linear methods realized using a library for multiprocessor task programming. Experiments on a number of different platforms show good efficiency results.

Download Full-text

Static analysis to reduce synchronization costs in data-parallel programs

Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '96 ◽

10.1145/237721.237799 ◽

1996 ◽

Cited By ~ 19

Author(s):

Manish Gupta ◽

Edith Schonberg

Keyword(s):

Static Analysis ◽

Parallel Programs ◽

Data Parallel

Download Full-text

GreenDPA: Thermal-aware execution of Data Parallel Applications

2015 18th CSI International Symposium on Computer Architecture and Digital Systems (CADS) ◽

10.1109/cads.2015.7377787 ◽

2015 ◽

Author(s):

Bagher Salami ◽

Abdorreza Savadi ◽

Hamid Noori

Keyword(s):

Parallel Applications ◽

Data Parallel

Download Full-text

FPGA Circuit Synthesis of Accelerator Data-Parallel Programs

2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines ◽

10.1109/fccm.2010.51 ◽

2010 ◽

Cited By ~ 8

Author(s):

Barry Bond ◽

Kerry Hammil ◽

Lubomir Litchev ◽

Satnam Singh

Keyword(s):

Parallel Programs ◽

Circuit Synthesis ◽

Data Parallel

Download Full-text

Modeling the slowdown of data-parallel applications in homogeneous and heterogeneous clusters of workstations

Proceedings Seventh Heterogeneous Computing Workshop (HCW'98) ◽

10.1109/hcw.1998.666548 ◽

2002 ◽

Cited By ~ 1

Author(s):

S.M. Figueira ◽

F. Berman

Keyword(s):

Parallel Applications ◽

Heterogeneous Clusters ◽

Data Parallel

Download Full-text

AUTOMATIC PARALLELISATION AND EXECUTION OF APPLICATIONS ON CLUSTERS

Journal of Interconnection Networks ◽

10.1142/s0219265901000427 ◽

2001 ◽

Vol 02 (03) ◽

pp. 331-343

Author(s):

CHRISTOPHER MCAVANEY ◽

ANDRZEJ GOSCINSKI

Keyword(s):

Parallel Execution ◽

Parallel Applications ◽

Parallel Application ◽

Parallel Processes ◽

Additional Input ◽

Execution Environment ◽

Automated Tool

Parallel execution is a very efficient means of processing vast amounts of data in a small amount of time. Creating parallel applications has never been easy, and requires much knowledge of the task and the execution environment used to execute parallel processes. The process of creating parallel applications can be made easier through using a compiler that automatically parallelises a supplied application. Executing the parallel application is also simplified when a well designed execution environment is used. Such an execution environment provides very powerful operations to the programmer transparently. Combining both a parallelising compiler and execution environment and providing a fully automated parallelisation and execution tool is the aim of this research. The advantage of using such a fully automated tool is that the user does not need to provide any additional input to gain the benefits of parallel execution. This report shows the tool and how it transparently supports the programmer creating parallel applications and supports their execution.

Download Full-text