Trace Generation and Deterministic Execution for Concurrent Programs

This paper proposes new algorithms for generation of trace ﬁles and deterministic execution of concurrent programs under test. The proposed algorithms are essential to automate the coverage testing of concurrent programs and allow to execute new synchronizations automatically, increasing the source code coverage with focus on non-determinism, and edges of communication and synchronization. Our algorithms consider programs with multiple paradigms of communication and synchronization (collective, blocking and non-blocking point-to-point message passing, and shared memory). We validate our algorithms by means of experiments based on nine representative benchmarks, which exercise non-trivial aspects of synchronization found in real applications. Our algorithms have a robust behaviour and meet their objectives. We also highlight the overhead generated with the algorithms.

Download Full-text

Rainbow: An Operating System for Software-Hardware Multitasking on Dynamically Partially Reconfigurable FPGAs

International Journal of Reconfigurable Computing ◽

10.1155/2013/789134 ◽

2013 ◽

Vol 2013 ◽

pp. 1-40 ◽

Cited By ~ 8

Author(s):

Krzysztof Jozwik ◽

Shinya Honda ◽

Masato Edahiro ◽

Hiroyuki Tomiyama ◽

Hiroaki Takada

Keyword(s):

Operating System ◽

Shared Memory ◽

Reconfigurable Computing ◽

Message Passing ◽

Complete Model ◽

Dynamic Partial Reconfiguration ◽

System Calls ◽

Scalable Hardware ◽

Point To Point ◽

Novel Model

Dynamic Partial Reconfiguration technology coupled with an Operating System for Reconfigurable Systems (OS4RS) allows for implementation of a hardware task concept, that is, an active computing object which can contend for reconfigurable computing resources and request OS services in a way software task does in a conventional OS. In this work, we show a complete model and implementation of a lightweight OS4RS supporting preemptable and clock-scalable hardware tasks. We also propose a novel, lightweight scheduling mechanism allowing for timely and priority-based reservation of reconfigurable resources, which aims at usage of preemption only at the time it brings benefits to the performance of a system. The architecture of the scheduler and the way it schedules allocations of the hardware tasks result in shorter latency of system calls, thereby reducing the overall OS overhead. Finally, we present a novel model and implementation of a channel-based intertask communication and synchronization suitable for software-hardware multitasking with preemptable and clock-scalable hardware tasks. It allows for optimizations of the communication on per task basis and utilizes point-to-point message passing rather than shared-memory communication, whenever it is possible. Extensive overhead tests of the OS4RS services as well as application speedup tests show efficiency of our approach.

Download Full-text

Analyzing Concurrent Programs Title for Potential Programming Errors

Modern Software Engineering Concepts and Practices - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-60960-215-4.ch016 ◽

2011 ◽

pp. 380-415

Author(s):

Qichang Chen ◽

Liqiang Wang ◽

Ping Guo ◽

He Huang

Keyword(s):

Shared Memory ◽

Message Passing ◽

Programming Models ◽

Concurrent Programs ◽

Sequential Programs ◽

Multithreaded Programming ◽

Testing And Debugging ◽

Race Condition ◽

Art Research ◽

Concurrency Errors

Today, multi-core/multi-processor hardware has become ubiquitous, leading to a fundamental turning point on software development. However, developing concurrent programs is difficult. Concurrency introduces the possibility of errors that do not exist in sequential programs. This chapter introduces the major concurrent programming models including multithreaded programming on shared memory and message passing programming on distributed memory. Then, the state-of-the-art research achievements on detecting concurrency errors such as deadlock, race condition, and atomicity violation are reviewed. Finally, the chapter surveys the widely used tools for testing and debugging concurrent programs.

Download Full-text

Data Flow Testing in Concurrent Programs with Message Passing and Shared Memory Paradigms

Procedia Computer Science ◽

10.1016/j.procs.2013.05.178 ◽

2013 ◽

Vol 18 ◽

pp. 149-158 ◽

Cited By ~ 6

Author(s):

Paulo S.L. Souza ◽

Simone S. Souza ◽

Murilo G. Rocha ◽

Rafael R. Prado ◽

Raphael N. Batista

Keyword(s):

Shared Memory ◽

Message Passing ◽

Data Flow ◽

Concurrent Programs ◽

Data Flow Testing

Download Full-text

Causal-Consistent Replay Reversible Semantics for Message Passing Concurrent Programs

Fundamenta Informaticae ◽

10.3233/fi-2021-2005 ◽

2021 ◽

Vol 178 (3) ◽

pp. 229-266

Author(s):

Ivan Lanese ◽

Adrián Palacios ◽

Germán Vidal

Keyword(s):

Programming Language ◽

Message Passing ◽

Concurrent Programming ◽

Concurrent Systems ◽

Concurrent Programs ◽

Unified Framework ◽

Innovative Technique ◽

Dual Notion

Causal-consistent reversible debugging is an innovative technique for debugging concurrent systems. It allows one to go back in the execution focusing on the actions that most likely caused a visible misbehavior. When such an action is selected, the debugger undoes it, including all and only its consequences. This operation is called a causal-consistent rollback. In this way, the user can avoid being distracted by the actions of other, unrelated processes. In this work, we introduce its dual notion: causal-consistent replay. We allow the user to record an execution of a running program and, in contrast to traditional replay debuggers, to reproduce a visible misbehavior inside the debugger including all and only its causes. Furthermore, we present a unified framework that combines both causal-consistent replay and causal-consistent rollback. Although most of the ideas that we present are rather general, we focus on a popular functional and concurrent programming language based on message passing: Erlang.

Download Full-text

Parallel Nonnegative Matrix Factorization via Newton Iteration

Parallel Processing Letters ◽

10.1142/s0129626416500146 ◽

2016 ◽

Vol 26 (03) ◽

pp. 1650014 ◽

Cited By ~ 3

Author(s):

Markus Flatz ◽

Marián Vajteršic

Keyword(s):

Shared Memory ◽

Matrix Factorization ◽

Message Passing ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Newton Iteration ◽

Parallel Execution ◽

Kkt Conditions ◽

Nonnegative Matrices ◽

First Order

The goal of Nonnegative Matrix Factorization (NMF) is to represent a large nonnegative matrix in an approximate way as a product of two significantly smaller nonnegative matrices. This paper shows in detail how an NMF algorithm based on Newton iteration can be derived using the general Karush-Kuhn-Tucker (KKT) conditions for first-order optimality. This algorithm is suited for parallel execution on systems with shared memory and also with message passing. Both versions were implemented and tested, delivering satisfactory speedup results.

Download Full-text

An Efficient Parallel Algorithm for Extreme Eigenvalues of Sparse Nonsymmetric Matrices

The International Journal of Supercomputing Applications ◽

10.1177/109434209200600106 ◽

1992 ◽

Vol 6 (1) ◽

pp. 98-111 ◽

Cited By ~ 2

Author(s):

S. K. Kim ◽

A. T. Chrortopoulos

Keyword(s):

Shared Memory ◽

Message Passing ◽

Sparse Matrices ◽

Data Locality ◽

Main Memory ◽

Global Memory ◽

Global Communication ◽

Step Method ◽

Arnoldi Algorithm ◽

Large Sparse Matrices

Main memory accesses for shared-memory systems or global communications (synchronizations) in message passing systems decrease the computation speed. In this paper, the standard Arnoldi algorithm for approximating a small number of eigenvalues, with largest (or smallest) real parts for nonsymmetric large sparse matrices, is restructured so that only one synchronization point is required; that is, one global communication in a message passing distributed-memory machine or one global memory sweep in a shared-memory machine per each iteration is required. We also introduce an s-step Arnoldi method for finding a few eigenvalues of nonsymmetric large sparse matrices. This method generates reduction matrices that are similar to those generated by the standard method. One iteration of the s-step Arnoldi algorithm corresponds to s iterations of the standard Arnoldi algorithm. The s-step method has improved data locality, minimized global communication, and superior parallel properties. These algorithms are implemented on a 64-node NCUBE/7 Hypercube and a CRAY-2, and performance results are presented.

Download Full-text

Programming shared memory multiprocessors with deterministic message-passing concurrency

2008 Design, Automation and Test in Europe ◽

10.1145/1403375.1403735 ◽

2008 ◽

Cited By ~ 12

Author(s):

Stephen A. Edwards ◽

Nalini Vasudevan ◽

Olivier Tardieu

Keyword(s):

Shared Memory ◽

Message Passing ◽

Shared Memory Multiprocessors

Download Full-text

Teaching tools for parallel processing

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0502219m ◽

2005 ◽

Vol 18 (2) ◽

pp. 219-224

Author(s):

Emina Milovanovic ◽

Natalija Stojanovic

Keyword(s):

Parallel Computing ◽

Parallel Processing ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Cost Effective ◽

Parallel Computers ◽

Free Software ◽

Teaching Tools ◽

Network Of Workstations

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text

Improving data transfer for model coupling

Geoscientific Model Development Discussions ◽

10.5194/gmdd-8-8981-2015 ◽

2015 ◽

Vol 8 (10) ◽

pp. 8981-9020 ◽

Cited By ~ 2

Author(s):

C. Zhang ◽

L. Liu ◽

G. Yang ◽

R. Li ◽

B. Wang

Keyword(s):

Performance Improvement ◽

Message Passing ◽

Message Passing Interface ◽

Data Transfer ◽

The Public ◽

Message Size ◽

Abstract Data ◽

Point To Point ◽

Component Models ◽

Size Variable

Abstract. Data transfer, which means transferring data fields between two component models or rearranging data fields among processes of the same component model, is a fundamental operation of a coupler. Most of state-of-the-art coupler versions currently use an implementation based on the point-to-point (P2P) communication of the Message Passing Interface (MPI) (call such an implementation "P2P implementation" for short). In this paper, we reveal the drawbacks of the P2P implementation, including low communication bandwidth due to small message size, variable and big number of MPI messages, and jams during communication. To overcome these drawbacks, we propose a butterfly implementation for data transfer. Although the butterfly implementation can outperform the P2P implementation in many cases, it degrades the performance in some cases because the total message size transferred by the butterfly implementation is larger than that by the P2P implementation. To make the data transfer completely improved, we design and implement an adaptive data transfer library that combines the advantages of both butterfly implementation and P2P implementation. Performance evaluation shows that the adaptive data transfer library significantly improves the performance of data transfer in most cases and does not decrease the performance in any cases. Now the adaptive data transfer library is open to the public and has been imported into a coupler version C-Coupler1 for performance improvement of data transfer. We believe that it can also improve other coupler versions.

Download Full-text

Characterization of MPI Communication Primitives on a Heterogeneous Cluster

Scientific Research Journal ◽

10.24191/srj.v6i2.5631 ◽

2009 ◽

Vol 6 (2) ◽

pp. 23

Author(s):

Siti Arpah Ahmad ◽

Mohamed Faidz Mohamed Said ◽

Norazan Mohamed Ramli ◽

Mohd Nasir Taib

Keyword(s):

Message Passing ◽

High Performance ◽

Data Transfer ◽

The Other ◽

Small Cluster ◽

Heterogeneous Cluster ◽

Processing Power ◽

Basic Performance ◽

Point To Point

This paper focuses on the performance of basic communication primitives, namely the overlap of message transfer with computation in the point-to-point communication within a small cluster of four nodes. The mpptest has been implemented to measure the basic performance of MPI message passing routines with a variety of message sizes. The mpptest is capable of measuring performance with many participating processes thus exposing contention and scalability problems. This enables programmers to select message sizes in order to isolate and evaluate sudden changes in performance. Investigating these matters is interesting in that non-blocking calls have the advantage of allowing the system to schedule communications even when many processes are running simultaneously. On the other hand, understanding the characteristics of computation and communication overlap is significant, because high- performance kernels often strive to achieve this, since it is both advantageous with respect to data transfer and latency hiding. The results indicate that certain overlap sizes utilize greater node processing power either in blocking send and receive operations or non-blocking send and receive operations. The results have elucidated a detailed MPI characterization of the performance regarding the overlap of message transfer with computation in a small cluster system.

Download Full-text