Efficient task spawning for shared memory and message passing in many-core architectures

The goal of Nonnegative Matrix Factorization (NMF) is to represent a large nonnegative matrix in an approximate way as a product of two significantly smaller nonnegative matrices. This paper shows in detail how an NMF algorithm based on Newton iteration can be derived using the general Karush-Kuhn-Tucker (KKT) conditions for first-order optimality. This algorithm is suited for parallel execution on systems with shared memory and also with message passing. Both versions were implemented and tested, delivering satisfactory speedup results.

Download Full-text

An Efficient Parallel Algorithm for Extreme Eigenvalues of Sparse Nonsymmetric Matrices

The International Journal of Supercomputing Applications ◽

10.1177/109434209200600106 ◽

1992 ◽

Vol 6 (1) ◽

pp. 98-111 ◽

Cited By ~ 2

Author(s):

S. K. Kim ◽

A. T. Chrortopoulos

Keyword(s):

Shared Memory ◽

Message Passing ◽

Sparse Matrices ◽

Data Locality ◽

Main Memory ◽

Global Memory ◽

Global Communication ◽

Step Method ◽

Arnoldi Algorithm ◽

Large Sparse Matrices

Main memory accesses for shared-memory systems or global communications (synchronizations) in message passing systems decrease the computation speed. In this paper, the standard Arnoldi algorithm for approximating a small number of eigenvalues, with largest (or smallest) real parts for nonsymmetric large sparse matrices, is restructured so that only one synchronization point is required; that is, one global communication in a message passing distributed-memory machine or one global memory sweep in a shared-memory machine per each iteration is required. We also introduce an s-step Arnoldi method for finding a few eigenvalues of nonsymmetric large sparse matrices. This method generates reduction matrices that are similar to those generated by the standard method. One iteration of the s-step Arnoldi algorithm corresponds to s iterations of the standard Arnoldi algorithm. The s-step method has improved data locality, minimized global communication, and superior parallel properties. These algorithms are implemented on a 64-node NCUBE/7 Hypercube and a CRAY-2, and performance results are presented.

Download Full-text

Programming shared memory multiprocessors with deterministic message-passing concurrency

2008 Design, Automation and Test in Europe ◽

10.1145/1403375.1403735 ◽

2008 ◽

Cited By ~ 12

Author(s):

Stephen A. Edwards ◽

Nalini Vasudevan ◽

Olivier Tardieu

Keyword(s):

Shared Memory ◽

Message Passing ◽

Shared Memory Multiprocessors

Download Full-text

Teaching tools for parallel processing

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0502219m ◽

2005 ◽

Vol 18 (2) ◽

pp. 219-224

Author(s):

Emina Milovanovic ◽

Natalija Stojanovic

Keyword(s):

Parallel Computing ◽

Parallel Processing ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Cost Effective ◽

Parallel Computers ◽

Free Software ◽

Teaching Tools ◽

Network Of Workstations

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text