EXTENDING OPENMP FOR TASK PARALLELISM

2003 ◽  
Vol 13 (03) ◽  
pp. 341-352 ◽  
Author(s):  
AMI MAROWKA

In a wide variety of scientific parallel applications, both task and data parallelism must be exploited to achieve the best possible performance on a multiprocessor machine. These applications induce task-graph parallelism with coarse-grain granularity. Nevertheless, using the available task-graph parallelism and combining it with data parallelism can increase the performance of parallel applications considerably since an additional degree of parallelism is exploited. The OpenMP standard supports data parallelism but does not support task-graph parallelism. In this paper we present an integration of task-graph parallelism in OpenMP by extending the parallel sections constructs to include task-index and precedence-relations matrix clauses. There are many ways in which task-graph parallelism can be supported in a programming environment. A fundamental design decision is whether the programmer has to write programs with explicit precedence relations, or if the responsibility of precedence relations generation is delegated to the compiler. One of the benefits provided by parallel programming models like OpenMP is that they liberate the programmer from dealing with the underlying details of communication and synchronization, which are cumbersome and error-prone tasks. If task-graph parallelism is to find acceptance, writing task-graph parallel programs must be no harder than writing data parallel programs, and therefore, in our design, precedence relations are described through simple programmer annotations, with implementation details handled by the system. This paper concludes with a description of several parallel application kernels that were developed to study the practical aspects of task-graph parallelism in OpenMP. The examples demonstrate that exploiting data and task parallelism in a single framework is the key to achieving good performance in a variety of applications.

2021 ◽  
Vol 21 ◽  
pp. 1-13
Author(s):  
Pin Xu ◽  
Masato Edahiro ◽  
Kondo Masaki

In this paper, we propose a method to automatically generate parallelized code from Simulink models, while exploiting both task and data parallelism. Building on previous research, we propose a model-based parallelizer (MBP) that exploits task parallelism and assigns tasks to CPU cores using a hierarchical clustering method. We also propose amethod in which data-parallel SYCL code is generated from Simulink models; computations with data parallelism are expressed in the form of S-Function Builder blocks and are executed in a heterogeneous computing environment. Most parts of the procedure can be automated with scripts, and the two methods can be applied together. In the evaluation, the data-parallel programs generated using our proposed method achieved a maximum speedup of approximately 547 times, compared to sequential programs, without observable differences in the computed results. In addition, the programs generated while exploiting both task and data parallelism were confirmed to have achieved better performance than those exploiting either one of the two.


2000 ◽  
Vol 8 (4) ◽  
pp. 231-246 ◽  
Author(s):  
Magne Haveraaen

Data parallelism has appeared as a fruitful approach to the parallelisation of compute-intensive programs. Data parallelism has the advantage of mimicking the sequential (and deterministic) structure of programs as opposed to task parallelism, where the explicit interaction of processes has to be programmed. In data parallelism data structures, typically collection classes in the form of large arrays, are distributed on the processors of the target parallel machine. Trying to extract distribution aspects from conventional code often runs into problems with a lack of uniformity in the use of the data structures and in the expression of data dependency patterns within the code. Here we propose a framework with two conceptual classes, Machine and Collection. The Machine class abstracts hardware communication and distribution properties. This gives a programmer high-level access to the important parts of the low-level architecture. The Machine class may readily be used in the implementation of a Collection class, giving the programmer full control of the parallel distribution of data, as well as allowing normal sequential implementation of this class. Any program using such a collection class will be parallelisable, without requiring any modification, by choosing between sequential and parallel versions at link time. Experiments with a commercial application, built using the Sophus library which uses this approach to parallelisation, show good parallel speed-ups, without any adaptation of the application program being needed.


2014 ◽  
pp. 64-70
Author(s):  
Sergei Gorlatch ◽  
Henry Kehbel

We describe VisPar -­ a new visual tool intended to support the programmer in the process of designing complex parallel applications. The novel features of the tool are as follows: support of both task and data parallelism and mixture thereof, use of analytical cost models for performance prediction, systematic program design by optimizing transformations, and visualization of the design process. We demonstrate the usage of VisPar on a relevant case study -­ the practically used Jpeg compression algorithm ­ and report on the current status of the tool implementation.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1825
Author(s):  
Donghyeon Kim ◽  
Seokwon Kang ◽  
Junsu Lim ◽  
Sunwook Jung ◽  
Woosung Kim ◽  
...  

As recent heterogeneous systems comprise multi-core CPUs and multiple GPUs, efficient allocation of multiple data-parallel applications has become a primary goal to achieve both maximum total performance and efficiency. However, the efficient orchestration of multiple applications is highly challenging because a detailed runtime status such as expected remaining time and available memory size of each computing device is hidden. To solve these problems, we propose a dynamic data-parallel application allocation framework called ADAMS. Evaluations show that our framework improves the average total execution device time by 1.85× over the round-robin policy in the non-shared-memory system with small data set.


2007 ◽  
Vol 15 (3) ◽  
pp. 137-155 ◽  
Author(s):  
Thomas Rauber ◽  
Gudula Rünger

On many parallel target platforms it can be advantageous to implement parallel applications as a collection of multiprocessor tasks that are concurrently executed and are internally implemented with fine-grain SPMD parallelism. A class of applications which can benefit from this programming style are methods for solving systems of ordinary differential equations. Many recent solvers have been designed with an additional potential of method parallelism, but the actual effectiveness of mixed task and data parallelism depends on the specific communication and computation requirements imposed by the equation to be solved. In this paper we study mixed task and data parallel implementations for general linear methods realized using a library for multiprocessor task programming. Experiments on a number of different platforms show good efficiency results.


2001 ◽  
Vol 02 (03) ◽  
pp. 331-343
Author(s):  
CHRISTOPHER MCAVANEY ◽  
ANDRZEJ GOSCINSKI

Parallel execution is a very efficient means of processing vast amounts of data in a small amount of time. Creating parallel applications has never been easy, and requires much knowledge of the task and the execution environment used to execute parallel processes. The process of creating parallel applications can be made easier through using a compiler that automatically parallelises a supplied application. Executing the parallel application is also simplified when a well designed execution environment is used. Such an execution environment provides very powerful operations to the programmer transparently. Combining both a parallelising compiler and execution environment and providing a fully automated parallelisation and execution tool is the aim of this research. The advantage of using such a fully automated tool is that the user does not need to provide any additional input to gain the benefits of parallel execution. This report shows the tool and how it transparently supports the programmer creating parallel applications and supports their execution.


Sign in / Sign up

Export Citation Format

Share Document