general matrix Latest Research Papers

Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%.

Download Full-text

Eigenvalue Estimates via Pseudospectra

Mathematics ◽

10.3390/math9151729 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1729

Author(s):

Georgios Katsouleas ◽

Vasiliki Panagakou ◽

Panayiotis Psarrakos

Keyword(s):

Complex Plane ◽

Large Scale ◽

Matrix Polynomial ◽

Parallel Implementation ◽

Boundary Curve ◽

Recursive Procedure ◽

Eigenvalue Estimates ◽

General Matrix ◽

Initial Term ◽

Grid Algorithms

In this note, given a matrix A∈Cn×n (or a general matrix polynomial P(z), z∈C) and an arbitrary scalar λ0∈C, we show how to define a sequence μkk∈N which converges to some element of its spectrum. The scalar λ0 serves as initial term (μ0=λ0), while additional terms are constructed through a recursive procedure, exploiting the fact that each term μk of this sequence is in fact a point lying on the boundary curve of some pseudospectral set of A (or P(z)). Then, the next term in the sequence is detected in the direction which is normal to this curve at the point μk. Repeating the construction for additional initial points, it is possible to approximate peripheral eigenvalues, localize the spectrum and even obtain spectral enclosures. Hence, as a by-product of our method, a computationally cheap procedure for approximate pseudospectra computations emerges. An advantage of the proposed approach is that it does not make any assumptions on the location of the spectrum. The fact that all computations are performed on some dynamically chosen locations on the complex plane which converge to the eigenvalues, rather than on a large number of predefined points on a rigid grid, can be used to accelerate conventional grid algorithms. Parallel implementation of the method or use in conjunction with randomization techniques can lead to further computational savings when applied to large-scale matrices.

Download Full-text

Efficient Evaluation of Matrix Polynomials beyond the Paterson–Stockmeyer Method

Mathematics ◽

10.3390/math9141600 ◽

2021 ◽

Vol 9 (14) ◽

pp. 1600

Author(s):

Jorge Sastre ◽

Javier Ibáñez

Keyword(s):

Evaluation Method ◽

Matrix Polynomial ◽

Matrix Product ◽

Product Evaluations ◽

Polynomial Evaluation ◽

General Matrix ◽

Polynomial Degree ◽

Matrix Polynomials ◽

Matrix Products ◽

The Matrix

Recently, two general methods for evaluating matrix polynomials requiring one matrix product less than the Paterson–Stockmeyer method were proposed, where the cost of evaluating a matrix polynomial is given asymptotically by the total number of matrix product evaluations. An analysis of the stability of those methods was given and the methods have been applied to Taylor-based implementations for computing the exponential, the cosine and the hyperbolic tangent matrix functions. Moreover, a particular example for the evaluation of the matrix exponential Taylor approximation of degree 15 requiring four matrix products was given, whereas the maximum polynomial degree available using Paterson–Stockmeyer method with four matrix products is 9. Based on this example, a new family of methods for evaluating matrix polynomials more efficiently than the Paterson–Stockmeyer method was proposed, having the potential to achieve a much higher efficiency, i.e., requiring less matrix products for evaluating a matrix polynomial of certain degree, or increasing the available degree for the same cost. However, the difficulty of these family of methods lies in the calculation of the coefficients involved for the evaluation of general matrix polynomials and approximations. In this paper, we provide a general matrix polynomial evaluation method for evaluating matrix polynomials requiring two matrix products less than the Paterson-Stockmeyer method for degrees higher than 30. Moreover, we provide general methods for evaluating matrix polynomial approximations of degrees 15 and 21 with four and five matrix product evaluations, respectively, whereas the maximum available degrees for the same cost with the Paterson–Stockmeyer method are 9 and 12, respectively. Finally, practical examples for evaluating Taylor approximations of the matrix cosine and the matrix logarithm accurately and efficiently with these new methods are given.

Download Full-text

An Approximate GEMM Unit for Energy-Efficient Object Detection

Sensors ◽

10.3390/s21124195 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4195

Author(s):

Ratko Pilipović ◽

Vladimir Risojević ◽

Janko Božič ◽

Patricio Bulić ◽

Uroš Lotrič

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Neural Networks ◽

Convolutional Neural Network ◽

Convolutional Neural Networks ◽

Energy Efficient ◽

Graphics Processing Units ◽

Matrix Multiplication ◽

Approximate Computing ◽

General Matrix

Edge computing brings artificial intelligence algorithms and graphics processing units closer to data sources, making autonomy and energy-efficient processing vital for their design. Approximate computing has emerged as a popular strategy for energy-efficient circuit design, where the challenge is to achieve the best tradeoff between design efficiency and accuracy. The essential operation in artificial intelligence algorithms is the general matrix multiplication (GEMM) operation comprised of matrix multiplication and accumulation. This paper presents an approximate general matrix multiplication (AGEMM) unit that employs approximate multipliers to perform matrix–matrix operations on four-by-four matrices given in sixteen-bit signed fixed-point format. The synthesis of the proposed AGEMM unit to the 45 nm Nangate Open Cell Library revealed that it consumed only up to 36% of the area and 25% of the energy required by the exact general matrix multiplication unit. The AGEMM unit is ideally suited to convolutional neural networks, which can adapt to the error induced in the computation. We evaluated the AGEMM units’ usability for honeybee detection with the YOLOv4-tiny convolutional neural network. The results implied that we can deploy the AGEMM units in convolutional neural networks without noticeable performance degradation. Moreover, the AGEMM unit’s employment can lead to more area- and energy-efficient convolutional neural network processing, which in turn could prolong sensors’ and edge nodes’ autonomy.

Download Full-text

KernelFaRer

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3459010 ◽

2021 ◽

Vol 18 (3) ◽

pp. 1-22

Author(s):

João P. L. De Carvalho ◽

Braedy Kuzma ◽

Ivan Korostelev ◽

José Nelson Amaral ◽

Christopher Barton ◽

...

Keyword(s):

Deep Learning ◽

Linear Algebra ◽

Pattern Matching ◽

Source Code ◽

Matrix Multiplication ◽

Fully Integrated ◽

General Matrix ◽

The Past ◽

Compiler Framework ◽

Alternative Solutions

Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a code pattern for which a well-tuned library implementation exists is found in the source code of an application, the highest performing solution is to replace the pattern with a call to the library. Idiom-recognition solutions in the past either required pattern matching machinery that was outside of the compilation framework or provided a very brittle solution that would fail even for minor variants in the pattern source code. This article introduces Kernel Find & Replacer ( KernelFaRer ), an idiom recognizer implemented entirely in the existing LLVM compiler framework. The versatility of KernelFaRer is demonstrated by matching and replacing two linear algebra idioms, general matrix-matrix multiplication (GEMM), and symmetric rank-2k update (SYR2K). Both GEMM and SYR2K are used extensively in scientific computation, and GEMM is also a central building block for deep learning and computer graphics algorithms. The idiom recognition in KernelFaRer is much more robust than alternative solutions, has a much lower compilation overhead, and is fully integrated in the broadly used LLVM compilation tools. KernelFaRer replaces existing GEMM and SYR2K idioms with computations performed by BLAS, Eigen, MKL (Intel’s x86), ESSL (IBM’s PowerPC), and BLIS (AMD). Gains in performance that reach 2000× over hand-crafted source code compiled at the highest optimization level demonstrate that replacing application code with library call is a performant solution.

Download Full-text

Binary Darboux transformation for general matrix mKdV equations and reduced counterparts

Chaos Solitons & Fractals ◽

10.1016/j.chaos.2021.110824 ◽

2021 ◽

Vol 146 ◽

pp. 110824

Author(s):

Wen-Xiu Ma

Keyword(s):

Darboux Transformation ◽

General Matrix ◽

Binary Darboux Transformation

Download Full-text

Low Latency YOLO v3-Tiny Accelerator for Low-Cost FPGA using General Matrix Multiplication Principle

IEEE Access ◽

10.1109/access.2021.3120629 ◽

2021 ◽

pp. 1-1

Author(s):

Trio Adiono ◽

Adiwena Putra ◽

Nana Sutisna ◽

Infall Syafalni ◽

Rahmat Mulyawan

Keyword(s):

Low Cost ◽

Matrix Multiplication ◽

Low Latency ◽

General Matrix

Download Full-text

Linear k-power Preservers on Tensor Products of Matrices

Journal of Mathematics Research ◽

10.5539/jmr.v12n6p110 ◽

2020 ◽

Vol 12 (6) ◽

pp. 110

Author(s):

Le Yan ◽

Yang Zhang

Keyword(s):

Tensor Products ◽

Linear Operators ◽

General Matrix ◽

Linear Map ◽

Preserver Problems ◽

Matrix Spaces ◽

Products Of Matrices

Invariants and the study of the map preserving a certain invariant play vital roles in the study of the theoretical mathematics. The preserver problems are the researches on linear operators that preserve certain invariants between matrix sets. Based on the result of linear $k$-power preservers on general matrix spaces, in terms of the advantages of matrix tensor products which is not limited by the size of matrices as well as the immense actual background, the study of the structure of the linear $k$-power preservers on tensor products of matrices is essential, which is coped with in this paper. That is to characterize a linear map $f:M_{m_{1}\cdots m_{l}}\rightarrow M_{m_{1}\cdots m_{l}}$ satisfying $f(X_{1}\otimes \cdots \otimes X_{l})^{k}=f\left( (X_{1}\otimes \cdots \otimes X_{l})^{k}\right) $ for all $X_{1}\otimes \cdots \otimes X_{l}\in M_{m_{1}\cdots m_{l}}$.

Download Full-text

general matrix
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Comparing the Performance of General Matrix Multiplication Routine on Heterogeneous Computing Systems

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Eigenvalue Estimates via Pseudospectra

Efficient Evaluation of Matrix Polynomials beyond the Paterson–Stockmeyer Method

An Approximate GEMM Unit for Energy-Efficient Object Detection

KernelFaRer

Binary Darboux transformation for general matrix mKdV equations and reduced counterparts

Low Latency YOLO v3-Tiny Accelerator for Low-Cost FPGA using General Matrix Multiplication Principle

Linear k-power Preservers on Tensor Products of Matrices

Export Citation Format

general matrixRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Comparing the Performance of General Matrix Multiplication Routine on Heterogeneous Computing Systems

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Eigenvalue Estimates via Pseudospectra

Efficient Evaluation of Matrix Polynomials beyond the Paterson–Stockmeyer Method

An Approximate GEMM Unit for Energy-Efficient Object Detection

KernelFaRer

Binary Darboux transformation for general matrix mKdV equations and reduced counterparts

Low Latency YOLO v3-Tiny Accelerator for Low-Cost FPGA using General Matrix Multiplication Principle

Linear k-power Preservers on Tensor Products of Matrices

general matrix
Recently Published Documents