SPARSE COMPUTATION WITH PEI

1999 ◽  
Vol 10 (04) ◽  
pp. 425-442 ◽  
Author(s):  
FRÉDÉRIQUE VOISIN ◽  
GUY-RENÉ PERRIN

PEI formalism has been designed to reason and develop parallel programs in the context of data parallelism. In this paper, we focus on the use of PEI to transform a program involving dense matrices into a new program involving sparse matrices, using the example of the matrix-vector product.

2010 ◽  
Vol 09 (05) ◽  
pp. 825-846 ◽  
Author(s):  
WENWU CHEN ◽  
BILL POIRIER

The eigenvalue/eigenvector and linear solve problems arising in computational quantum dynamics applications (e.g. rovibrational spectroscopy, reaction cross-sections, etc.) often involve large sparse matrices that exhibit a certain block structure. In such cases, specialized iterative methods that employ optimal separable basis (OSB) preconditioners (derived from a block Jacobi diagonalization procedure) have been found to be very efficient, vis-à-vis reducing the required CPU effort on serial computing platforms. Recently,1,2 a parallel implementation was introduced, based on a nonstandard domain decomposition scheme. Near-perfect parallel scalability was observed for the OSB preconditioner construction routines up to hundreds of nodes; however, the fundamental matrix–vector product operation itself was found not to scale well, in general. In addition, the number of nodes was selectively chosen, so as to ensure perfect load balancing. In this paper, two essential improvements are discussed: (1) new algorithm for the matrix–vector product operation with greatly improved parallel scalability and (2) generalization for arbitrary number of nodes and basis sizes. These improvements render the resultant parallel quantum dynamics codes suitable for robust application to a wide range of real molecular problems, running on massively parallel computing architectures.


Author(s):  
Rob H. Bisseling

This chapter introduces irregular algorithms and presents the example of parallel sparse matrix-vector multiplication (SpMV), which is the central operation in iterative linear system solvers. The irregular sparsity pattern of the matrix does not change during the multiplication, which may be repeated many times. This justifies putting a lot of effort into finding a good data distribution. The Mondriaan distribution of a sparse matrix is a useful non-Cartesian distribution that can be found by hypergraph-based partitioning. The Mondriaan package implements such a partitioning and also the newer medium-grain partitioning method. The chapter analyses the special cases of random sparse matrices and Laplacian matrices. It uses performance profiles and geometric means to compare different partitioning methods. Furthermore, it presents the hybrid-BSP model and a hybrid-BSP SpMV, which are aimed at hybrid distributed/shared-memory architectures. The parallel SpMV can be incorporated in applications, ranging from PageRank computation to artificial neural networks.


1995 ◽  
Vol 05 (02) ◽  
pp. 263-274 ◽  
Author(s):  
MARK A. STALZER

Presented is a parallel algorithm based on the fast multipole method (FMM) for the Helmholtz equation. This variant of the FMM is useful for computing radar cross sections and antenna radiation patterns. The FMM decomposes the impedance matrix into sparse components, reducing the operation count of the matrix-vector multiplication in iterative solvers to O(N3/2) (where N is the number of unknowns). The parallel algorithm divides the problem into groups and assigns the computation involved with each group to a processor node. Careful consideration is given to the communications costs. A time complexity analysis of the algorithm is presented and compared with empirical results from a Paragon XP/S running the lightweight Sandia/University of New Mexico operating system (SUNMOS). For a 90,000 unknown problem running on 60 nodes, the sparse representation fits in memory and the algorithm computes the matrix-vector product in 1.26 seconds. It sustains an aggregate rate of 1.4 Gflop/s. The corresponding dense matrix would occupy over 100 Gbytes and, assuming that I/O is free, would require on the order of 50 seconds to form the matrix-vector product.


2009 ◽  
Vol 42 (6) ◽  
pp. 1020-1029 ◽  
Author(s):  
Boris V. Strokopytov

A novel algorithm is described for multiplying a normal equation matrix by an arbitrary real vector using the fast Fourier transform technique during anisotropic crystallographic refinement. The matrix–vector algorithm allows one to solve normal matrix equations using the conjugate-gradients or conjugate-directions technique without explicit calculation of a normal matrix. The anisotropic version of the algorithm has been implemented in a new version of the computer programFMLSQ. The updated program has been tested on several protein structures at high resolution. In addition, rapid methods for preconditioner and normal matrix–vector product calculations are described.


Author(s):  
Maria Barreda ◽  
Manuel F Dolz ◽  
M Asunción Castaño

Modeling the performance and energy consumption of the sparse matrix-vector product (SpMV) is essential to perform off-line analysis and, for example, choose a target computer architecture that delivers the best performance-energy consumption ratio. However, this task is especially complex given the memory-bounded nature and irregular memory accesses of the SpMV, mainly dictated by the input sparse matrix. In this paper, we propose a Machine Learning (ML)-driven approach that leverages Convolutional Neural Networks (CNNs) to provide accurate estimations of the performance and energy consumption of the SpMV kernel. The proposed CNN-based models use a blockwise approach to make the CNN architecture independent of the matrix size. These models are trained to estimate execution time as well as total, package, and DRAM energy consumption at different processor frequencies. The experimental results reveal that the overall relative error ranges between 0.5% and 14%, while at matrix level is not superior to 10%. To demonstrate the applicability and accuracy of the SpMV CNN-based models, this study is complemented with an ad-hoc time-energy model for the PageRank algorithm, a popular algorithm for web information retrieval used by search engines, which internally realizes the SpMV kernel.


2018 ◽  
Vol 4 ◽  
pp. e151 ◽  
Author(s):  
Bérenger Bramas ◽  
Pavel Kus

The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.


Author(s):  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Enrique S Quintana-Ortí

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.


Sign in / Sign up

Export Citation Format

Share Document