Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions

PeerJ Computer Science ◽

10.7717/peerj-cs.151 ◽

2018 ◽

Vol 4 ◽

pp. e151 ◽

Cited By ~ 3

Author(s):

Bérenger Bramas ◽

Pavel Kus

Keyword(s):

Open Source ◽

High Performance ◽

Sparse Matrix ◽

Assembly Language ◽

Memory Storage ◽

Vector Product ◽

Zero Padding ◽

The Matrix ◽

Block Based ◽

Matrix Vector

The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.

Download Full-text

Convolutional neural nets for estimating the run time and energy consumption of the sparse matrix-vector product

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020953196 ◽

2020 ◽

pp. 109434202095319

Author(s):

Maria Barreda ◽

Manuel F Dolz ◽

M Asunción Castaño

Keyword(s):

Energy Consumption ◽

Computer Architecture ◽

Ad Hoc ◽

Sparse Matrix ◽

Neural Nets ◽

Vector Product ◽

Consumption Ratio ◽

The Matrix ◽

Memory Accesses ◽

Matrix Vector

Modeling the performance and energy consumption of the sparse matrix-vector product (SpMV) is essential to perform off-line analysis and, for example, choose a target computer architecture that delivers the best performance-energy consumption ratio. However, this task is especially complex given the memory-bounded nature and irregular memory accesses of the SpMV, mainly dictated by the input sparse matrix. In this paper, we propose a Machine Learning (ML)-driven approach that leverages Convolutional Neural Networks (CNNs) to provide accurate estimations of the performance and energy consumption of the SpMV kernel. The proposed CNN-based models use a blockwise approach to make the CNN architecture independent of the matrix size. These models are trained to estimate execution time as well as total, package, and DRAM energy consumption at different processor frequencies. The experimental results reveal that the overall relative error ranges between 0.5% and 14%, while at matrix level is not superior to 10%. To demonstrate the applicability and accuracy of the SpMV CNN-based models, this study is complemented with an ad-hoc time-energy model for the PageRank algorithm, a popular algorithm for web information retrieval used by search engines, which internally realizes the SpMV kernel.

Download Full-text

Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6515 ◽

2021 ◽

Author(s):

José I. Aliaga ◽

Hartwig Anzt ◽

Thomas Grützmacher ◽

Enrique S. Quintana‐Ortí ◽

Andrés E. Tomás

Keyword(s):

Load Balancing ◽

Graphics Processing Units ◽

Sparse Matrix ◽

Multicore Processors ◽

Vector Product ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

Selecting optimal SpMV realizations for GPUs via machine learning

The International Journal of High Performance Computing Applications ◽

10.1177/1094342021990738 ◽

2021 ◽

pp. 109434202199073

Author(s):

Ernesto Dufrechou ◽

Pablo Ezzatti ◽

Enrique S Quintana-Ortí

Keyword(s):

Machine Learning ◽

Sparse Matrix ◽

Machine Learning Techniques ◽

Optimal Method ◽

Learning Techniques ◽

General Rules ◽

Machine Learning Approach ◽

The Matrix ◽

Time And Energy ◽

Matrix Vector

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.

Download Full-text

On sparse matrix-vector product optimization

The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005. ◽

10.1109/aiccsa.2005.1387022 ◽

2005 ◽

Cited By ~ 2

Author(s):

N. Emad ◽

O. Hamdi-Larbi ◽

Z. Mahjoub

Keyword(s):

Sparse Matrix ◽

Vector Product ◽

Product Optimization ◽

Matrix Vector

Download Full-text

Balanced CSR Sparse Matrix-Vector Product on Graphics Processors

Lecture Notes in Computer Science - Euro-Par 2017: Parallel Processing ◽

10.1007/978-3-319-64203-1_50 ◽

2017 ◽

pp. 697-709 ◽

Cited By ~ 3

Author(s):

Goran Flegar ◽

Enrique S. Quintana-Ortí

Keyword(s):

Sparse Matrix ◽

Vector Product ◽

Graphics Processors ◽

Matrix Vector

Download Full-text

FPGA Coprocessor for Simulation of Neural Networks Using Compressed Matrix Storage

System and Circuit Design for Biologically-Inspired Intelligent Learning ◽

10.4018/978-1-60960-018-1.ch011 ◽

2011 ◽

pp. 255-275

Author(s):

Jörg Bornschein

Keyword(s):

Neural Network ◽

Neural Networks ◽

Sparse Matrix ◽

Receptive Fields ◽

The State ◽

Connectivity Matrix ◽

Vector Product ◽

Sparse Connectivity ◽

Matrix Vector ◽

Direct Implementation

An FPGA-based coprocessor has been implemented which simulates the dynamics of a large recurrent neural network composed of binary neurons. The design has been used for unsupervised learning of receptive fields. Since the number of neurons to be simulated (>104) exceeds the available FPGA logic capacity for direct implementation, a set of streaming processors has been designed. Given the state- and activity vectors of the neurons at time t and a sparse connectivity matrix, these streaming processors calculate the state- and activity vectors for time t + 1. The operation implemented by the streaming processors can be understood as a generalized form of a sparse matrix vector product (SpMxV). The largest dataset, the sparse connectivity matrix, is stored and processed in a compressed format to better utilize the available memory bandwidth.

Download Full-text

The Impact of Voltage-Frequency Scaling for the Matrix-Vector Product on the IBM POWER8

Euro-Par 2016: Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-43659-3_8 ◽

2016 ◽

pp. 103-116 ◽

Cited By ~ 1

Author(s):

Sandra Catalán ◽

A. Cristiano I. Malossi ◽

Costas Bekas ◽

Enrique S. Quintana-Ortí

Keyword(s):

Vector Product ◽

Frequency Scaling ◽

The Matrix ◽

Voltage Frequency ◽

The Impact ◽

Matrix Vector

Download Full-text

A MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU

Progress In Electromagnetics Research ◽

10.2528/pier11031607 ◽

2011 ◽

Vol 116 ◽

pp. 49-63 ◽

Cited By ~ 40

Author(s):

Adam Dziekonski ◽

Adam Lamecki ◽

Michal Mrozowski

Keyword(s):

Sparse Matrix ◽

Vector Product ◽

Matrix Vector ◽

Memory Efficient

Download Full-text

Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam

The International Journal of High Performance Computing Applications ◽

10.1177/1094342004038951 ◽

2004 ◽

Vol 18 (2) ◽

pp. 225-236 ◽

Cited By ~ 52

Author(s):

John Mellor-Crummey ◽

John Garvin

Keyword(s):

Sparse Matrix ◽

Vector Product ◽

Matrix Vector

Download Full-text

Strategies for vectorizing the sparse matrix vector product on the CRAY XMP, CRAY 2, and CYBER 205

Journal of Computational Chemistry ◽

10.1002/jcc.540080508 ◽

1987 ◽

Vol 8 (5) ◽

pp. 636-644 ◽

Cited By ~ 3

Author(s):

Charles W. Bauschlicher ◽

Harry Partridge

Keyword(s):

Sparse Matrix ◽

Vector Product ◽

Matrix Vector

Download Full-text