Iteratively solving sparse linear system based on PaRSEC task scheduling
With the new architecture and new programming paradigms such as task-based scheduling emerging in the parallel high performance computing area, it is of great importance to utilize these features to tune the monolithic computing codes. In this article, the classical conjugate gradient algorithms targeting at sparse linear system Ax = b in Krylov subspace are pipelining to execute interdependent tasks on Parallel Runtime Scheduling and Execution Controller (PaRSEC) runtime. Firstly, the sparse matrix A is split in rows to unfold more coarse-grained parallelism. Secondly, the partitioned sub-vectors are not assembled into one full vector in RAM to run sparse matrix–vector product (SpMV) operations for eliminating the communication overhead. Moreover, in the SpMV computation, if all elements of one column in the split sub-matrix are zeros, the corresponding product operations of these elements may be removed by reorganizing sub-vectors. Finally, the latency of migrating sub-vector is partially overlapped by the duration of performing SpMV operations through the further splitting in columns of sparse matrix on GPUs. In experiments, a series of tests demonstrate that optimal speedup and higher pipelining efficiency has been achieved for the pipelined task scheduling on PaRSEC runtime. Fusing SpMV concurrency and dot product pipelining can achieve higher speedup and efficiency.