Strategies for the Synthesis of Fast Algorithms for the Computation of the Matrix-vector Products

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.

Download Full-text

DMRG Approach to Fast Linear Algebra in the TT-Format

Computational Methods in Applied Mathematics ◽

10.2478/cmam-2011-0021 ◽

2011 ◽

Vol 11 (3) ◽

pp. 382-393 ◽

Cited By ~ 26

Author(s):

Ivan Oseledets

Keyword(s):

Linear Systems ◽

Linear Algebra ◽

Direct Method ◽

Fast Algorithms ◽

Approximate Solutions ◽

Vector Product ◽

The Matrix ◽

Minimization Scheme

AbstractIn this paper, the concept of the DMRG minimization scheme is extended to several important operations in the TT-format, like the matrix-by-vector product and the conversion from the canonical format to the TT-format. Fast algorithms are implemented and a stabilization scheme based on randomization is proposed. The comparison with the direct method is performed on a sequence of matrices and vectors coming as approximate solutions of linear systems in the TT-format. A generated example is provided to show that randomization is really needed in some cases. The matrices and vectors used are available from the author or at http://spring.inm.ras.ru/osel

Download Full-text

Application of Spline Collocation Method in Analysis of Beam and Continuous Beam

Journal of Mechanics ◽

10.1017/s1727719100004354 ◽

2003 ◽

Vol 19 (2) ◽

pp. 319-326 ◽

Cited By ~ 1

Author(s):

Lai-Yun Wu ◽

Yang-Tzung Chen

Keyword(s):

Collocation Method ◽

Differential Quadrature Method ◽

Spline Functions ◽

Continuous Beam ◽

Spline Collocation ◽

Spline Collocation Method ◽

The Matrix ◽

Weighting Coefficients ◽

Matrix Vector ◽

Excellent Accuracy

ABSTRACTIn this paper, spline collocation method (SCM) is successfully extended to solve the generalized problems of beam structures. The spline functions in SCM are re-formulated by finite difference method in a systematical way that is easily understood by engineers. The manipulation of SCM is further simplified by the introduction of quintic table so that the matrix-vector governing equation can be easily formulated to solve the weighting coefficients. SCM is first examined by the problems of a generalized single-span beam undergoing various types of loadings and boundary conditions, and it is then extended to the problems of continuous beam with multiple spans. By comparing with available analytical results, differential quadrature method (DQM), if any, excellent accuracy in deflection is achieved.

Download Full-text

Advanced three-dimensional electromagnetic modeling using a nested integral equation approach

Geophysical Journal International ◽

10.1093/gji/ggab072 ◽

2021 ◽

Author(s):

Chaojian Chen ◽

Mikhail Kruglyakov ◽

Alexey Kuvshinov

Keyword(s):

Integral Equation ◽

Three Dimensional ◽

Region Of Interest ◽

Electromagnetic Modeling ◽

Multi Scale ◽

Scale Modeling ◽

Uniform Grids ◽

The Matrix ◽

Integral Equation Approach ◽

Matrix Vector

Summary Most of the existing three-dimensional (3-D) electromagnetic (EM) modeling solvers based on the integral equation (IE) method exploit fast Fourier transform (FFT) to accelerate the matrix-vector multiplications. This in turn requires a laterally-uniform discretization of the modeling domain. However, there is often a need for multi-scale modeling and inversion, for instance, to properly account for the effects of non-uniform distant structures, and at the same time, to accurately model the effects from local anomalies. In such scenarios, the usage of laterally-uniform grids leads to excessive computational loads, both in terms of memory and time. To alleviate this problem, we developed an efficient 3-D EM modeling tool based on a multi-nested IE approach. Within this approach, the IE modeling is first performed at a large domain and on a (laterally-uniform) coarse grid, and then the results are refined in the region of interest by performing modeling at a smaller domain and on a (laterally-uniform) denser grid. At the latter stage, the modeling results obtained at the previous stage are exploited. The lateral uniformity of the grids at each stage allows us to keep using the FFT for the matrix-vector multiplications. An important novelty of the paper is a development of a “rim domain” concept which further improves the performance of the multi-nested IE approach. We verify the developed tool on both idealized and realistic 3-D conductivity models, and demonstrate its efficiency and accuracy.

Download Full-text

Improved Generalized Single-Source Tangential Equivalence Principle Algorithm with Contact-Region Modeling Method for Array Structures

International Journal of Antennas and Propagation ◽

10.1155/2018/9875041 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11

Author(s):

Yao Han ◽

Hanru Shao ◽

Jianfeng Dong

Keyword(s):

Equivalence Principle ◽

Contact Region ◽

Near Field ◽

Single Source ◽

Matrix Vector Multiplication ◽

The Matrix ◽

Array Structures ◽

Fast Multipole Algorithm ◽

Matrix Vector ◽

Equivalence Principle Algorithm

An improved generalized single-source tangential equivalence principle algorithm (GSST-EPA) is proposed for analyzing array structures with connected elements. In order to use the advantages of GSST-EPA, the connected array elements are decomposed and computed by a contact-region modeling (CRM) method, which makes that each element has the same meshes. The unknowns of elements can be transferred onto the equivalence surfaces by GSST-EPA. The scattering matrix in GSST-EPA needs to be solved and stored only once due to the same meshes for each element. The shift invariant of translation matrices is also used to reduce the computation of near-field interaction. Furthermore, the multilevel fast multipole algorithm (MLFMA) is used to accelerate the matrix-vector multiplication in the GSST-EPA. Numerical results are shown to demonstrate the accuracy and efficiency of the proposed method.

Download Full-text

The Impact of Voltage-Frequency Scaling for the Matrix-Vector Product on the IBM POWER8

Euro-Par 2016: Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-43659-3_8 ◽

2016 ◽

pp. 103-116 ◽

Cited By ~ 1

Author(s):

Sandra Catalán ◽

A. Cristiano I. Malossi ◽

Costas Bekas ◽

Enrique S. Quintana-Ortí

Keyword(s):

Vector Product ◽

Frequency Scaling ◽

The Matrix ◽

Voltage Frequency ◽

The Impact ◽

Matrix Vector

Download Full-text

A submatrix algorithm for the matrix-vector multiplication of very large matrices

Journal of Computational Chemistry ◽

10.1002/jcc.540100307 ◽

1989 ◽

Vol 10 (3) ◽

pp. 344-345

Author(s):

Roland Lindh ◽

Per-�rke Malmquist

Keyword(s):

Matrix Vector Multiplication ◽

The Matrix ◽

Matrix Vector

Download Full-text

Automatic generation of fast algorithms for matrix–vector multiplication

International Journal of Computer Mathematics ◽

10.1080/00207160.2017.1294252 ◽

2017 ◽

Vol 95 (3) ◽

pp. 626-644 ◽

Cited By ~ 2

Author(s):

B. Andreatto ◽

A. Cariow

Keyword(s):

Fast Algorithms ◽

Automatic Generation ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

REACHABILITY PROBLEMS FOR PRODUCTS OF MATRICES IN SEMIRINGS

International Journal of Algebra and Computation ◽

10.1142/s021819670600313x ◽

2006 ◽

Vol 16 (03) ◽

pp. 603-627 ◽

Cited By ~ 13

Author(s):

STÉPHANE GAUBERT ◽

RICARDO D. KATZ

Keyword(s):

Matrix Product ◽

Reachability Problem ◽

Tropical Semiring ◽

The Matrix ◽

Left And Right ◽

Matrix Vector ◽

Products Of Matrices

We consider the following matrix reachability problem: given r square matrices with entries in a semiring, is there a product of these matrices which attains a prescribed matrix? Similarly, we define the vector (resp. scalar) reachability problem, by requiring that the matrix product, acting by right multiplication on a prescribed row vector, gives another prescribed row vector (resp. when multiplied on the left and right by prescribed row and column vectors, gives a prescribed scalar). We show that over any semiring, scalar reachability reduces to vector reachability which is equivalent to matrix reachability, and that for any of these problems, the specialization to any r ≥ 2 is equivalent to the specialization to r = 2. As an application of these reductions and of a theorem of Krob, we show that when r = 2, the vector and matrix reachability problems are undecidable over the max-plus semiring (ℤ∪{-∞}, max ,+). These reductions also improve known results concerning the classical zero corner problem. Finally, we show that the matrix, vector, and scalar reachability problems are decidable over semirings whose elements are "positive", like the tropical semiring (ℤ∪{+∞}, min ,+).

Download Full-text

Algorithmic Structures for Realizing Short-Length Circular Convolutions with Reduced Complexity

Electronics ◽

10.3390/electronics10222800 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2800

Author(s):

Aleksandr Cariow ◽

Janusz P. Paplinski

Keyword(s):

Energy Efficient ◽

Hardware Implementation ◽

Circulant Matrix ◽

Short Length ◽

Convolution Operation ◽

Naive Method ◽

The Matrix ◽

Reduced Complexity ◽

Matrix Vector ◽

Naive Approach

A set of efficient algorithmic solutions suitable to the fully parallel hardware implementation of the short-length circular convolution cores is proposed. The advantage of the presented algorithms is that they require significantly fewer multiplications as compared to the naive method of implementing this operation. During the synthesis of the presented algorithms, the matrix notation of the cyclic convolution operation was used, which made it possible to represent this operation using the matrix–vector product. The fact that the matrix multiplicand is a circulant matrix allows its successful factorization, which leads to a decrease in the number of multiplications when calculating such a product. The proposed algorithms are oriented towards a completely parallel hardware implementation, but in comparison with a naive approach to a completely parallel hardware implementation, they require a significantly smaller number of hardwired multipliers. Since the wired multiplier occupies a much larger area on the VLSI and consumes more power than the wired adder, the proposed solutions are resource efficient and energy efficient in terms of their hardware implementation. We considered circular convolutions for sequences of lengths N= 2, 3, 4, 5, 6, 7, 8, and 9.

Download Full-text