MULTITHREADED PARALLELISM WITH OPENMP

2005 ◽  
Vol 15 (04) ◽  
pp. 367-378 ◽  
Author(s):  
RAIMI RUFAI ◽  
MUSLIM BOZYIGIT ◽  
JARALLA ALGHAMDI ◽  
MOATAZ AHMED

While multithreaded programming is an effective way to exploit concurrency, multithreaded programs are notoriously hard to program, debug and tune for performance. In this paper, we present OpenMP shared memory programming as a viable alternative and a much simpler way to write multithreaded programs. We show through empirical results obtained by running, on a single processor machine, a simple matrix multiplication program written in OpenMP C that the drop in performance compared with the single threaded version even on a uniprocessor machine may be negligible. However, this is well compensated for by the increased programmer productivity resulting from the ease of programming, debugging, tuning and the relative ease of OpenMP skill acquisition.

2003 ◽  
Vol 13 (03) ◽  
pp. 353-364 ◽  
Author(s):  
XIE YONG ◽  
HSU WEN-JING

This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single processor execution. Earlier research in the Cilk project proposed the "strict" computational model, in which every dependency goes from a thread x only to one of x's ancestor threads, and guaranteed both linear speedup and linear expansion of space. However, Cilk threads are stateless, and the task graph that Cilk language expresses is series-parallel graph, which is a proper subset of arbitrary task graph. Moreover, Cilk does not support applications with pipelining. We propose the "aligned" multithreaded computational model, which extends the "strict" computational model in Cilk. In the aligned multithreaded computational model, dependencies can go from arbitrary thread x not only to x's ancestor threads, but also to x's younger brother threads, that are spawned by x's parent thread but after x. We use the same measures of time and space as those used in Cilk: T1 is the time required for executing the computation on 1 processor, T∞ is the time required by an infinite number of processors, and S1 is the space required to execute the computation on 1 processor. We show that for any aligned computation, there exists an execution schedule that achieves both efficient time and efficient space. Specifically, we show that for an execution of any aligned multithreaded computation on P processors, the time required is bounded by O(T1/P + T∞), and the space required can be loosely bounded by O(λ·S1P), where λ is the maximum number of younger brother threads that have the same parent thread and can be blocked during execution. If we assume that λ is a constant, and the space requirements for elder and younger brother threads are the same, then the space required would be bounded by O(S1P). Based on the aligned multithreaded computational model, we show that the aligned multithreaded computational model supports pipelined applications. Furthermore, we propose a multithreaded programming language and show that it can express arbitrary task graph.


1991 ◽  
Vol 15 (3) ◽  
pp. 235-256 ◽  
Author(s):  
X. Cyril ◽  
J. Angeles ◽  
A. Misra

In this paper the formulation and simulation of the dynamical equations of multibody mechanical systems comprising of both rigid and flexible-links are accomplished in two steps: in the first step, each link is considered as an unconstrained body and hence, its Euler-Lagrange (EL) equations are derived disregarding the kinematic couplings; in the second step, the individual-link equations, along with the associated constraint forces, are assembled to obtain the constrained dynamical equations of the multibody system. These constraint forces are then efficiently eliminated by simple matrix multiplication of the said equations by the transpose of the natural orthogonal complement of kinematic velocity constraints to obtain the independent dynamical equations. The equations of motion are solved for the generalized accelerations using the Cholesky decomposition method and integrated using Gear’s method for stiff differential equations. Finally, the dynamical behaviour of the Shuttle Remote Manipulator when performing a typical manoeuvre is determined using the above approach.


Author(s):  
Dimitri J. Mavriplis

Summary The implementation and performance of a hybrid OpenMP/ MPI parallel communication strategy for an unstructured mesh computational fluid dynamics code is described. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Parallelism is obtained through domain decomposition for both communication models. Single processor computational rates as well as scalability curves are given on various architectures. For the architectures studied in this work, the OpenMP or hybrid OpenMP/MPI communication strategies achieved no appreciable performance benefit over an exclusive MPI communication strategy.


Author(s):  
P. Raghu ◽  
K. Sriram

Grid computing is a special type of parallel computing, which allows us to unite pools of servers, storage systems, and networks into a single large virtual super computer. Grid computing has the advantages of solving complex problems in a shorter time and also makes better use of the existing hardware. It can take advantage of underutilized resources to meet business requirements while minimizing additional costs. There are many Grid setup tools available. In this paper, Globus Toolkit, an open source tool for grid enabled applications, is considered. Initially grid is established between two systems running Linux, using Globus Toolkit. A simple matrix multiplication program, which is capable of running both in grid and stand alone systems, is developed. The application is executed in single system varying the order of the matrices. The same application is split into two sub jobs and run on two grid machines with different orders. Finally the results of the execution are compares and the results are presented in graphs. The work can be extended further to find the type of parallelizing suitable for the application developed. Similarly, FP tree algorithm is taken and the data sets are fed into different machine and in stand alone system. A suitable load balancing mechanism for grid application is discussed. The sections in the paper are arranged as following; Introduction to Grid, Grid setup using Globus toolkit, splitting of the matrix application, FP tree algorithm, performance results, future works, conclusion and references.


Sign in / Sign up

Export Citation Format

Share Document