Correct program parallelisations

AbstractA commonly used approach to develop deterministic parallel programs is to augment a sequential program with compiler directives that indicate which program blocks may potentially be executed in parallel. This paper develops a verification technique to reason about such compiler directives, in particular to show that they do not change the behaviour of the program. Moreover, the verification technique is tool-supported and can be combined with proving functional correctness of the program. To develop our verification technique, we propose a simple intermediate representation (syntax and semantics) that captures the main forms of deterministic parallel programs. This language distinguishes three kinds of basic blocks: parallel, vectorised and sequential blocks, which can be composed using three different composition operators: sequential, parallel and fusion composition. We show how a widely used subset of OpenMP can be encoded into this intermediate representation. Our verification technique builds on the notion of iteration contract to specify the behaviour of basic blocks; we show that if iteration contracts are manually specified for single blocks, then that is sufficient to automatically reason about data race freedom of the composed program. Moreover, we also show that it is sufficient to establish functional correctness on a linearised version of the original program to conclude functional correctness of the parallel program. Finally, we exemplify our approach on an example OpenMP program, and we discuss how tool support is provided.

Download Full-text

On Connections Between Parallel Programs and Nondeterministic Sequential Programs

Fundamenta Informaticae ◽

10.3233/fi-1983-6203 ◽

1983 ◽

Vol 6 (2) ◽

pp. 171-183

Author(s):

Zenon Sadowski

Keyword(s):

Affirmative Answer ◽

Parallel Programs ◽

Parallel Program ◽

Sequential Programs ◽

Sequential Program ◽

The Given

The paper contains an affirmative answer to the following question: Does nondeterministic sequential program equivalent with respect to the set of final valuations exist for every parallel program with semantics defined in paper[3]? It also includes an example of the construction of a nondeterministic sequential program equivalent with respect to the set of final valuations to the given parallel program.

Download Full-text

A Classification and Closure Properties of Languages for Describing Concurrent System Behaviours

Fundamenta Informaticae ◽

10.3233/fi-1981-4305 ◽

1981 ◽

Vol 4 (3) ◽

pp. 531-549 ◽

Cited By ~ 1

Author(s):

Miklós Szijártó

Keyword(s):

Formal Languages ◽

Parallel Program ◽

Sufficient Condition ◽

Necessary And Sufficient Condition ◽

Concurrent System ◽

Closure Properties ◽

Sequential Program ◽

Necessary And Sufficient ◽

Special Case ◽

Independence Relations

The correspondence between sequential program schemes and formal languages is well known (Blikle and Mazurkiewicz (1972), Engelfriet (1974)). The situation is more complicated in the case of parallel program schemes, and trace languages (Mazurkiewicz (1977)) have been introduced to describe them. We introduce the concept of the closure of a language on a so called independence relation on the alphabet of the language, and formulate several theorems about them and the trace languages. We investigate the closedness properties of Chomsky classes under closure on independence relations, and as a special case we derive a new necessary and sufficient condition for the regularity of the commutative closure of a language.

Download Full-text

Interaction with the User in the SAPFOR System

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2021-24-1-157-183 ◽

2021 ◽

Vol 24 (1) ◽

pp. 157-183

Author(s):

Никита Андреевич Катаев

Keyword(s):

Parallel Programming ◽

Program Transformation ◽

Heterogeneous Computing ◽

Programming Model ◽

Parallel Programs ◽

Parallel Program ◽

Program Parallelization ◽

Parallel Programming Model ◽

The One ◽

High Level

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.

Download Full-text

Compile-time support for efficient data race detection in shared-memory parallel programs

ACM SIGPLAN Notices ◽

10.1145/174267.171370 ◽

1993 ◽

Vol 28 (12) ◽

pp. 129-139 ◽

Cited By ~ 6

Author(s):

John Mellor-Crummey

Keyword(s):

Shared Memory ◽

Parallel Programs ◽

Data Race ◽

Data Race Detection ◽

Race Detection ◽

Efficient Data

Download Full-text

ON MESSAGE PACKAGING IN TASK SCHEDULING FOR DISTRIBUTED MEMORY PARALLEL MACHINES

International Journal of Foundations of Computer Science ◽

10.1142/s0129054101000497 ◽

2001 ◽

Vol 12 (03) ◽

pp. 285-306 ◽

Cited By ~ 2

Author(s):

NORIYUKI FUJIMOTO ◽

TOMOKI BABA ◽

TAKASHI HASHIMOTO ◽

KENICHI HAGIHARA

Keyword(s):

Task Scheduling ◽

High Performance ◽

Distributed Memory ◽

Parallel Machines ◽

Scheduling Algorithm ◽

Parallel Programs ◽

Parallel Program ◽

Interprocessor Communication ◽

Task Scheduling Algorithm ◽

Software Overhead

In this paper, we report a performance gap betweeen a schedule with small makespan on the task scheduling model and the corresponding parallel program on distributed memory parallel machines. The main reason of the gap is the software overhead in the interprocessor communication. Therefore, speedup ratios of schedules on the model do not approximate well to those of parallel programs on the machines. The purpose of the paper is to get a task scheduling algorithm that generates a schedule with good approximation to the corresponding parallel program and with small makespan. For this purpose, we propose algorithm BCSH that generates only bulk synchronous schedules. In those schedules, no-communication phases and communication phases appear alternately. All interprocessor communications are done only in the latter phases, and thus the corresponding parallel programs can make better use of the message packaging technique easily. It reduces many software overheads of messages form a source processor to the same destination processor to almost one software overhead, and improves the performance of a parallel program significantly. Finally, we show some experimental results of performance gaps on BCSH, Kruatrachue's algorithm DSH, and Ahmad et al's algorithm ECPFD. The schedules by DSH and ECPFD are famous for their small makespans, but message packaging can not be effectively applied to the corresponding program. The results show that a bulk synchronous schedule with small makespan has advantages that the gap is small and the corresponding program is a high performance parallel one.

Download Full-text

Efficient data race detection for distributed memory parallel programs

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 ◽

10.1145/2063384.2063452 ◽

2011 ◽

Cited By ~ 19

Author(s):

Chang-Seo Park ◽

Koushik Sen ◽

Paul Hargrove ◽

Costin Iancu

Keyword(s):

Distributed Memory ◽

Parallel Programs ◽

Data Race ◽

Data Race Detection ◽

Race Detection ◽

Efficient Data

Download Full-text

Tail Recursion Transformation in Functional Dataflow Parallel Programs

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2012-4-48-58 ◽

2015 ◽

Vol 19 (4) ◽

pp. 48-58

Author(s):

A. I. Legalov ◽

O. V. Nepomnyaschy ◽

I. V. Matkovsky ◽

M. S. Kropacheva

Keyword(s):

Parallel Programs ◽

Parallel Program ◽

Tail Recursion

The peculiarities of transforming functional dataflow parallel programs into programs with finite resources are analysed. It is considered how these transformations are affected by the usage of asynchronous lists, the return of delayed lists and the variation of the data arrival pace relative to the time of its processing. These transformations allow us to generate multiple programs with static parallelism based on one and the some functional dataflow parallel program.

Download Full-text

FROM LOCKS TO CORRECT AND EFFICIENT TRANSACTIONAL MEMORY

Journal of Circuits System and Computers ◽

10.1142/s0218126612400051 ◽

2012 ◽

Vol 21 (02) ◽

pp. 1240005

Author(s):

CHENG WANG ◽

YOUFENG WU

Keyword(s):

Transactional Memory ◽

Parallel Programs ◽

Parallel Program ◽

Program Execution ◽

Software Transactional Memory ◽

Correctness Condition ◽

Critical Sections ◽

Efficient Software

Transactional memory addresses a number of important issues in lock-based parallel programs. Unfortunately, the semantics of transactions are different from those of critical sections defined by locks. The semantic differences make it difficult to correctly port existing lock-based programs to transaction-based programs. Experienced programmers accustomed to lock-based programming can easily make mistakes in transaction-based programming as parallel programs running correctly using locks can run incorrectly when critical sections are converted to using transactions. This problem becomes even more severe in porting lock-based programs to use the efficient software transactional memory. In this paper, we first identify three necessary properties in a program for the program execution using transactions to be equivalent to the program execution using locks. Assuming that the input lock-based program satisfies the necessary properties (i.e., a well-behaved parallel program), we next present a correctness condition to verify the transactional memory implementation in order for the program execution using transactions to be equivalent to the program execution using locks. Finally, we develop a correct and efficient software transactional memory implementation that satisfies the correctness condition so that locks in the well-behaved parallel programs can be converted to use the efficient software transactional memory easily and correctly.

Download Full-text

Research on dynamic optimization method of embedded multi-core performance based on runtime

Recent Patents on Engineering ◽

10.2174/1872212114999200422142831 ◽

2020 ◽

Vol 14 ◽

Author(s):

Ying Huang ◽

Xiaoyong Fang ◽

Yihuang Zeng ◽

Zhixin Huang ◽

Hongtao Guo

Keyword(s):

Performance Monitoring ◽

Dynamic Scheduling ◽

Scheduling Algorithm ◽

Optimization Method ◽

Parallel Programs ◽

Parallel Program ◽

Monitoring And Control ◽

Core System ◽

Control Interface ◽

And Control

Background: Embedded multi-core systems often have special limitations in sharing resources and storage capacity. These limitations often lead to parallel programs running with lower parallel efficiency due to bandwidth, data competition and other factors. Objective: In order to improve the performance of embedded multi-core systems, parallel strategies can be adjusted dynamically and adaptively for different parallel program structures. Method: A control mechanism of the thread count based on runtime information feedback is proposed, which enables the system to dynamically select the number of threads when the program runs best according to the structure characteristics of parallel programs. Then, an adaptive dynamic scheduling algorithm is proposed to solve the load imbalance in parallel program execution. Results: An optimization framework based on run-time architecture is presented, which consists of two parts: performance monitoring and control interface. It can take corresponding optimization strategies according to the running state of parallel programs. Conclusion: The performance of embedded multi-core system is improved.

Download Full-text