DESIGNING PARALLEL ALGORITHMS FOR HIERARCHICAL SMP CLUSTERS
Clusters of symmetric multiprocessor nodes (SMP clusters) are one of the most important parallel architectures at the moment. The architecture consists of shared-memory nodes with multiple processors and a fast interconnection network between the nodes. New programming models try to exploit this architecture by using threads in the nodes and using message-passing-libraries for inter-node communication. In order to develop efficient algorithms, it is necessary to consider the hybrid nature of the architecture and of the programming models. We present the κNUMA-model and a methodology that build a good base for designing efficient algorithms for SMP clusters. The κNUMA-model is a computational model that extends the bulk-synchronous parallel (BSP) model with the characteristics of SMP clusters and new hybrid programming models. The κNUMA-methodology suggests to develop efficient overall algorithms by developing efficient algorithms for each level in the hierarchy. We use the problem of personalized one-to-all-broadcast and the dense matrix-vector-multiplication for the presentation. The theoretical results of the analysis of the dense matrix-vector-multiplication are verified practically. We show results of experiments, made on a Linux-cluster of dual Pentium-III nodes.