Near-Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time

2019 ◽

pp. 144-157

Author(s):

A. Myasishchev ◽

S. Lienkov ◽

V. Dzhulii ◽

I. Muliar

Keyword(s):

Linear Algebra ◽

Linear Equations ◽

Matrix Multiplication ◽

Performance Comparison ◽

Double Precision ◽

Graphics Processors ◽

Performance Study ◽

Software Module ◽

Maximum Acceleration ◽

Computational Procedures

Research goals and objectives: the purpose of the article is to study the feasibility of graphics processors using in solving linear equations systems and calculating matrix multiplication as compared with conventional multi-core processors. The peculiarities of the MAGMA and CUBLAS libraries use for various graphics processors are considered. A performance comparison is made between the Tesla C2075 and GeForce GTX 480 GPUs and a six-core AMD processor. Subject of research: the software is developed basing on the MAGMA and CUBLAS libraries for the purpose of the NVIDIA Tesla C2075 and GeForce GTX 480 GPUs performance study for linear equation systems solving and matrix multiplication calculating. Research methods used: libraries were used to parallelize the linear algebra problems solution. For GPUs, these are MAGMA and CUBLAS, for multi-core processors, the ScaLAPACK and ATLAS libraries. To study the operational speed there are used methods and algorithms of computational procedures parallelization similar to these libraries. A software module has been developed for linear equations systems solving and matrix multiplication calculating by parallel systems. Results of the research: it has been determined that for double-precision numbers the GPU GeForce GTX 480 and the GPU Tesla C2075 performance is approximately 3.5 and 6.3 times higher than that of the AMD CPU. And the GPU GeForce GTX 480 performance is 1.3 times higher than the GPU Tesla C2075 performance for single precision numbers. To achieve maximum performance of the NVIDIA CUDA GPU, you need to use the MAGMA or CUBLAS libraries, which accelerate the calculations by about 6.4 times as compared to the traditional programming method. It has been determined that in equations systems solving on a 6-core CPU, it is possible to achieve a maximum acceleration of 3.24 times as compared to calculations on the 1st core using the ScaLAPACK and ATLAS libraries instead of 6-fold theoretical acceleration. Therefore, it is impossible to efficiently use processors with a large number of cores with considered libraries. It is demonstrated that the advantage of the GPU over the CPU increases with the number of equations.

Download Full-text

Multiplication of medium-density matrices using TensorFlow on multicore CPUs

Tehnički glasnik ◽

10.31803/tg-20191104183930 ◽

2019 ◽

Vol 13 (4) ◽

pp. 286-290

Author(s):

Siraphob Theeracheep ◽

Jaruloj Chongstitvatana

Keyword(s):

Machine Learning ◽

Image Processing ◽

Linear Algebra ◽

Matrix Multiplication ◽

Medium Density ◽

Density Matrices ◽

Input Matrix ◽

Programming Paradigm ◽

Dataflow Programming ◽

Multicore Cpus

Matrix multiplication is an essential part of many applications, such as linear algebra, image processing and machine learning. One platform used in such applications is TensorFlow, which is a machine learning library whose structure is based on dataflow programming paradigm. In this work, a method for multiplication of medium-density matrices on multicore CPUs using TensorFlow platform is proposed. This method, called tbt_matmul, utilizes TensorFlow built-in methods tf.matmul and tf.sparse_matmul. By partitioning each input matrix into four smaller sub-matrices, called tiles, and applying an appropriate multiplication method to each pair depending on their density, the proposed method outperforms the built-in methods for matrices of medium density and matrices of significantly uneven distribution of non-zeros.

Download Full-text

Communication lower bounds and optimal algorithms for numerical linear algebra

Acta Numerica ◽

10.1017/s0962492914000038 ◽

2014 ◽

Vol 23 ◽

pp. 1-155 ◽

Cited By ~ 38

Author(s):

G. Ballard ◽

E. Carson ◽

J. Demmel ◽

M. Hoemmen ◽

N. Knight ◽

...

Keyword(s):

Linear Algebra ◽

Lower Bounds ◽

Krylov Subspace ◽

Sparse Matrices ◽

Arithmetic Operation ◽

Matrix Multiplication ◽

Direct Methods ◽

Numerical Linear Algebra ◽

Theory And Practice ◽

Large Speed

The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, communication, or moving data, is the bottleneck. This motivates us to seek algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. In this paper we summarize recent progress in three aspects of this problem. First we describe lower bounds on communication. Some of these generalize known lower bounds for dense classical (O(n3)) matrix multiplication to all direct methods of linear algebra, to sequential and parallel algorithms, and to dense and sparse matrices. We also present lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices. Second, we compare these lower bounds to widely used versions of these algorithms, and note that these widely used algorithms usually communicate asymptotically more than is necessary. Third, we identify or invent new algorithms for most linear algebra problems that do attain these lower bounds, and demonstrate large speed-ups in theory and practice.

Download Full-text

Randomized algorithms in numerical linear algebra

Acta Numerica ◽

10.1017/s0962492917000058 ◽

2017 ◽

Vol 26 ◽

pp. 95-135 ◽

Cited By ~ 5

Author(s):

Ravindran Kannan ◽

Santosh Vempala

Keyword(s):

Linear Algebra ◽

Randomized Algorithms ◽

Matrix Multiplication ◽

Fast Algorithms ◽

Numerical Linear Algebra ◽

Low Rank ◽

Low Rank Approximation ◽

Rank Approximation

This survey provides an introduction to the use of randomization in the design of fast algorithms for numerical linear algebra. These algorithms typically examine only a subset of the input to solve basic problems approximately, including matrix multiplication, regression and low-rank approximation. The survey describes the key ideas and gives complete proofs of the main results in the field. A central unifying idea is sampling the columns (or rows) of a matrix according to their squared lengths.

Download Full-text

A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

10.1101/174813 ◽

2017 ◽

Author(s):

Siddharth Samsi ◽

Brian Helfer ◽

Jeremy Kepner ◽

Albert Reuther ◽

Darrell O. Ricke

Keyword(s):

Linear Algebra ◽

Dna Sequences ◽

Tandem Repeats ◽

Matrix Multiplication ◽

Forensic Analysis ◽

Dense Matrix ◽

Nucleotide Polymorphisms ◽

Base Pairs ◽

Algebra Approach ◽

Speed Up

AbstractAnalysis of DNA samples is an important tool in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide polymorphisms (SNPs) has utility for analysis of complex DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses significant computational challenges because the forensic analysis scales by the product of the loci count and number of DNA samples to be analyzed. In this paper, we discuss the implementation of a DNA sequence comparison algorithm by re-casting the algorithm in terms of linear algebra primitives. By developing an overloaded matrix multiplication approach to DNA comparisons, we can leverage advances in GPU hardware and algoithms for dense matrix multiplication (DGEMM) to speed up DNA sample comparisons. We show that it is possible to compare 2048 unknown DNA samples with 20 million known samples in under 6 seconds using a NVIDIA K80 GPU.

Download Full-text

Sharing Teaching Ideas: Bisymmetric Matrices: Some Elementary New Problems

Mathematics Teacher ◽

10.5951/mt.82.8.0622 ◽

1989 ◽

Vol 82 (8) ◽

pp. 622-623

Author(s):

Samuel Councilman

Keyword(s):

Linear Algebra ◽

Matrix Multiplication ◽

Scalar Multiplication ◽

Class Discussion ◽

Teaching Ideas

In introductory linear algebra courses one continually seeks interesting sets of matrices that are closed under the operations of matrix addition, scalar multiplication, and if possible, matrix multiplication. Most texts mention symmetric and antisymmetric matrices and ask the reader to show that these sets are closed under matrix addition and scalar multiplication but fail to be closed under matrix multiplication. Few textbooks, if any, suggest an investigation of the set of matrices that are symmetric with respect to both diagonals, namely bisymmetric matrices. The following is a sequence of relatively straightforward problems that can be used as homework, class discussion, or even examination material in elementary linear algebra classes.

Download Full-text

Rationale for Matrix Multiplication in Linear Algebra Textbooks

Challenges and Strategies in Teaching Linear Algebra - ICME-13 Monographs ◽

10.1007/978-3-319-66811-6_5 ◽

2018 ◽

pp. 103-125

Author(s):

John Paul Cook ◽

Dov Zazkis ◽

Adam Estrup

Keyword(s):

Linear Algebra ◽

Matrix Multiplication ◽

Algebra Textbooks

Download Full-text

Parallel numerical linear algebra

Acta Numerica ◽

10.1017/s096249290000235x ◽

1993 ◽

Vol 2 ◽

pp. 111-197 ◽

Cited By ~ 93

Author(s):

James W. Demmel ◽

Michael T. Heath ◽

Henk A. van der Vorst

Keyword(s):

Eigenvalue Problem ◽

Linear Algebra ◽

Parallel Machines ◽

Sparse Matrices ◽

Matrix Multiplication ◽

Iterative Algorithms ◽

Numerical Linear Algebra ◽

Software Systems ◽

Open Problems ◽

Value Decomposition

We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of paralled processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.

Download Full-text