scholarly journals Correct program parallelisations

Author(s):  
S. Blom ◽  
S. Darabi ◽  
M. Huisman ◽  
M. Safari

AbstractA commonly used approach to develop deterministic parallel programs is to augment a sequential program with compiler directives that indicate which program blocks may potentially be executed in parallel. This paper develops a verification technique to reason about such compiler directives, in particular to show that they do not change the behaviour of the program. Moreover, the verification technique is tool-supported and can be combined with proving functional correctness of the program. To develop our verification technique, we propose a simple intermediate representation (syntax and semantics) that captures the main forms of deterministic parallel programs. This language distinguishes three kinds of basic blocks: parallel, vectorised and sequential blocks, which can be composed using three different composition operators: sequential, parallel and fusion composition. We show how a widely used subset of OpenMP can be encoded into this intermediate representation. Our verification technique builds on the notion of iteration contract to specify the behaviour of basic blocks; we show that if iteration contracts are manually specified for single blocks, then that is sufficient to automatically reason about data race freedom of the composed program. Moreover, we also show that it is sufficient to establish functional correctness on a linearised version of the original program to conclude functional correctness of the parallel program. Finally, we exemplify our approach on an example OpenMP program, and we discuss how tool support is provided.

1983 ◽  
Vol 6 (2) ◽  
pp. 171-183
Author(s):  
Zenon Sadowski

The paper contains an affirmative answer to the following question: Does nondeterministic sequential program equivalent with respect to the set of final valuations exist for every parallel program with semantics defined in paper[3]? It also includes an example of the construction of a nondeterministic sequential program equivalent with respect to the set of final valuations to the given parallel program.


1981 ◽  
Vol 4 (3) ◽  
pp. 531-549 ◽  
Author(s):  
Miklós Szijártó

The correspondence between sequential program schemes and formal languages is well known (Blikle and Mazurkiewicz (1972), Engelfriet (1974)). The situation is more complicated in the case of parallel program schemes, and trace languages (Mazurkiewicz (1977)) have been introduced to describe them. We introduce the concept of the closure of a language on a so called independence relation on the alphabet of the language, and formulate several theorems about them and the trace languages. We investigate the closedness properties of Chomsky classes under closure on independence relations, and as a special case we derive a new necessary and sufficient condition for the regularity of the commutative closure of a language.


2021 ◽  
Vol 24 (1) ◽  
pp. 157-183
Author(s):  
Никита Андреевич Катаев

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.


2001 ◽  
Vol 12 (03) ◽  
pp. 285-306 ◽  
Author(s):  
NORIYUKI FUJIMOTO ◽  
TOMOKI BABA ◽  
TAKASHI HASHIMOTO ◽  
KENICHI HAGIHARA

In this paper, we report a performance gap betweeen a schedule with small makespan on the task scheduling model and the corresponding parallel program on distributed memory parallel machines. The main reason of the gap is the software overhead in the interprocessor communication. Therefore, speedup ratios of schedules on the model do not approximate well to those of parallel programs on the machines. The purpose of the paper is to get a task scheduling algorithm that generates a schedule with good approximation to the corresponding parallel program and with small makespan. For this purpose, we propose algorithm BCSH that generates only bulk synchronous schedules. In those schedules, no-communication phases and communication phases appear alternately. All interprocessor communications are done only in the latter phases, and thus the corresponding parallel programs can make better use of the message packaging technique easily. It reduces many software overheads of messages form a source processor to the same destination processor to almost one software overhead, and improves the performance of a parallel program significantly. Finally, we show some experimental results of performance gaps on BCSH, Kruatrachue's algorithm DSH, and Ahmad et al's algorithm ECPFD. The schedules by DSH and ECPFD are famous for their small makespans, but message packaging can not be effectively applied to the corresponding program. The results show that a bulk synchronous schedule with small makespan has advantages that the gap is small and the corresponding program is a high performance parallel one.


2015 ◽  
Vol 19 (4) ◽  
pp. 48-58
Author(s):  
A. I. Legalov ◽  
O. V. Nepomnyaschy ◽  
I. V. Matkovsky ◽  
M. S. Kropacheva

The peculiarities of transforming functional dataflow parallel programs into programs with finite resources are analysed. It is considered how these transformations are affected by the usage of asynchronous lists, the return of delayed lists and the variation of the data arrival pace relative to the time of its processing. These transformations allow us to generate multiple programs with static parallelism based on one and the some functional dataflow parallel program.


2012 ◽  
Vol 21 (02) ◽  
pp. 1240005
Author(s):  
CHENG WANG ◽  
YOUFENG WU

Transactional memory addresses a number of important issues in lock-based parallel programs. Unfortunately, the semantics of transactions are different from those of critical sections defined by locks. The semantic differences make it difficult to correctly port existing lock-based programs to transaction-based programs. Experienced programmers accustomed to lock-based programming can easily make mistakes in transaction-based programming as parallel programs running correctly using locks can run incorrectly when critical sections are converted to using transactions. This problem becomes even more severe in porting lock-based programs to use the efficient software transactional memory. In this paper, we first identify three necessary properties in a program for the program execution using transactions to be equivalent to the program execution using locks. Assuming that the input lock-based program satisfies the necessary properties (i.e., a well-behaved parallel program), we next present a correctness condition to verify the transactional memory implementation in order for the program execution using transactions to be equivalent to the program execution using locks. Finally, we develop a correct and efficient software transactional memory implementation that satisfies the correctness condition so that locks in the well-behaved parallel programs can be converted to use the efficient software transactional memory easily and correctly.


2020 ◽  
Vol 14 ◽  
Author(s):  
Ying Huang ◽  
Xiaoyong Fang ◽  
Yihuang Zeng ◽  
Zhixin Huang ◽  
Hongtao Guo

Background: Embedded multi-core systems often have special limitations in sharing resources and storage capacity. These limitations often lead to parallel programs running with lower parallel efficiency due to bandwidth, data competition and other factors. Objective: In order to improve the performance of embedded multi-core systems, parallel strategies can be adjusted dynamically and adaptively for different parallel program structures. Method: A control mechanism of the thread count based on runtime information feedback is proposed, which enables the system to dynamically select the number of threads when the program runs best according to the structure characteristics of parallel programs. Then, an adaptive dynamic scheduling algorithm is proposed to solve the load imbalance in parallel program execution. Results: An optimization framework based on run-time architecture is presented, which consists of two parts: performance monitoring and control interface. It can take corresponding optimization strategies according to the running state of parallel programs. Conclusion: The performance of embedded multi-core system is improved.


2019 ◽  
Vol 15 (3-4) ◽  
pp. 289-306 ◽  
Author(s):  
Radha Nakade ◽  
Eric Mercer ◽  
Peter Aldous ◽  
Kyle Storey ◽  
Benjamin Ogles ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document