Strategies for the Synthesis of Fast Algorithms for the Computation of the Matrix-vector Products

Author(s):  
Aleksandr Cariow
Author(s):  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Enrique S Quintana-Ortí

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.


2011 ◽  
Vol 11 (3) ◽  
pp. 382-393 ◽  
Author(s):  
Ivan Oseledets

AbstractIn this paper, the concept of the DMRG minimization scheme is extended to several important operations in the TT-format, like the matrix-by-vector product and the conversion from the canonical format to the TT-format. Fast algorithms are implemented and a stabilization scheme based on randomization is proposed. The comparison with the direct method is performed on a sequence of matrices and vectors coming as approximate solutions of linear systems in the TT-format. A generated example is provided to show that randomization is really needed in some cases. The matrices and vectors used are available from the author or at http://spring.inm.ras.ru/osel


2003 ◽  
Vol 19 (2) ◽  
pp. 319-326 ◽  
Author(s):  
Lai-Yun Wu ◽  
Yang-Tzung Chen

ABSTRACTIn this paper, spline collocation method (SCM) is successfully extended to solve the generalized problems of beam structures. The spline functions in SCM are re-formulated by finite difference method in a systematical way that is easily understood by engineers. The manipulation of SCM is further simplified by the introduction of quintic table so that the matrix-vector governing equation can be easily formulated to solve the weighting coefficients. SCM is first examined by the problems of a generalized single-span beam undergoing various types of loadings and boundary conditions, and it is then extended to the problems of continuous beam with multiple spans. By comparing with available analytical results, differential quadrature method (DQM), if any, excellent accuracy in deflection is achieved.


Author(s):  
Chaojian Chen ◽  
Mikhail Kruglyakov ◽  
Alexey Kuvshinov

Summary Most of the existing three-dimensional (3-D) electromagnetic (EM) modeling solvers based on the integral equation (IE) method exploit fast Fourier transform (FFT) to accelerate the matrix-vector multiplications. This in turn requires a laterally-uniform discretization of the modeling domain. However, there is often a need for multi-scale modeling and inversion, for instance, to properly account for the effects of non-uniform distant structures, and at the same time, to accurately model the effects from local anomalies. In such scenarios, the usage of laterally-uniform grids leads to excessive computational loads, both in terms of memory and time. To alleviate this problem, we developed an efficient 3-D EM modeling tool based on a multi-nested IE approach. Within this approach, the IE modeling is first performed at a large domain and on a (laterally-uniform) coarse grid, and then the results are refined in the region of interest by performing modeling at a smaller domain and on a (laterally-uniform) denser grid. At the latter stage, the modeling results obtained at the previous stage are exploited. The lateral uniformity of the grids at each stage allows us to keep using the FFT for the matrix-vector multiplications. An important novelty of the paper is a development of a “rim domain” concept which further improves the performance of the multi-nested IE approach. We verify the developed tool on both idealized and realistic 3-D conductivity models, and demonstrate its efficiency and accuracy.


2018 ◽  
Vol 2018 ◽  
pp. 1-11
Author(s):  
Yao Han ◽  
Hanru Shao ◽  
Jianfeng Dong

An improved generalized single-source tangential equivalence principle algorithm (GSST-EPA) is proposed for analyzing array structures with connected elements. In order to use the advantages of GSST-EPA, the connected array elements are decomposed and computed by a contact-region modeling (CRM) method, which makes that each element has the same meshes. The unknowns of elements can be transferred onto the equivalence surfaces by GSST-EPA. The scattering matrix in GSST-EPA needs to be solved and stored only once due to the same meshes for each element. The shift invariant of translation matrices is also used to reduce the computation of near-field interaction. Furthermore, the multilevel fast multipole algorithm (MLFMA) is used to accelerate the matrix-vector multiplication in the GSST-EPA. Numerical results are shown to demonstrate the accuracy and efficiency of the proposed method.


Author(s):  
Sandra Catalán ◽  
A. Cristiano I. Malossi ◽  
Costas Bekas ◽  
Enrique S. Quintana-Ortí

1989 ◽  
Vol 10 (3) ◽  
pp. 344-345
Author(s):  
Roland Lindh ◽  
Per-�rke Malmquist

2006 ◽  
Vol 16 (03) ◽  
pp. 603-627 ◽  
Author(s):  
STÉPHANE GAUBERT ◽  
RICARDO D. KATZ

We consider the following matrix reachability problem: given r square matrices with entries in a semiring, is there a product of these matrices which attains a prescribed matrix? Similarly, we define the vector (resp. scalar) reachability problem, by requiring that the matrix product, acting by right multiplication on a prescribed row vector, gives another prescribed row vector (resp. when multiplied on the left and right by prescribed row and column vectors, gives a prescribed scalar). We show that over any semiring, scalar reachability reduces to vector reachability which is equivalent to matrix reachability, and that for any of these problems, the specialization to any r ≥ 2 is equivalent to the specialization to r = 2. As an application of these reductions and of a theorem of Krob, we show that when r = 2, the vector and matrix reachability problems are undecidable over the max-plus semiring (ℤ∪{-∞}, max ,+). These reductions also improve known results concerning the classical zero corner problem. Finally, we show that the matrix, vector, and scalar reachability problems are decidable over semirings whose elements are "positive", like the tropical semiring (ℤ∪{+∞}, min ,+).


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2800
Author(s):  
Aleksandr Cariow ◽  
Janusz P. Paplinski

A set of efficient algorithmic solutions suitable to the fully parallel hardware implementation of the short-length circular convolution cores is proposed. The advantage of the presented algorithms is that they require significantly fewer multiplications as compared to the naive method of implementing this operation. During the synthesis of the presented algorithms, the matrix notation of the cyclic convolution operation was used, which made it possible to represent this operation using the matrix–vector product. The fact that the matrix multiplicand is a circulant matrix allows its successful factorization, which leads to a decrease in the number of multiplications when calculating such a product. The proposed algorithms are oriented towards a completely parallel hardware implementation, but in comparison with a naive approach to a completely parallel hardware implementation, they require a significantly smaller number of hardwired multipliers. Since the wired multiplier occupies a much larger area on the VLSI and consumes more power than the wired adder, the proposed solutions are resource efficient and energy efficient in terms of their hardware implementation. We considered circular convolutions for sequences of lengths N= 2, 3, 4, 5, 6, 7, 8, and 9.


Sign in / Sign up

Export Citation Format

Share Document