A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.630 ◽

2002 ◽

Vol 14 (10) ◽

pp. 805-839 ◽

Author(s):

Vinod Valsalam ◽

Anthony Skjellum

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Low Level ◽

Performance Matrix

Download Full-text

Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software ◽

10.1145/1356052.1356053 ◽

2008 ◽

Vol 34 (3) ◽

pp. 1-25 ◽

Author(s):

Kazushige Goto ◽

Robert A. van de Geijn

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Performance Matrix

Download Full-text

A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures

The Journal of Supercomputing ◽

10.1007/s11227-015-1613-7 ◽

2016 ◽

Vol 72 (3) ◽

pp. 804-844 ◽

Author(s):

Vasilios Kelefouras ◽

A. Kritikakou ◽

Iosif Mporas ◽

Vasilios Kolonias

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Performance Matrix ◽

Gpu Architectures

Download Full-text

High Performance Matrix Multiplication based on Xilinx Virtex FPGA

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.spl.2019.08.00051 ◽

2019 ◽

Vol 1 (2) ◽

Author(s):

S Arulselvi

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Performance Matrix

Download Full-text

High Performance Matrix Multiplication on Many Cores

Lecture Notes in Computer Science - Euro-Par 2009 Parallel Processing ◽

10.1007/978-3-642-03869-3_87 ◽

2009 ◽

pp. 948-959 ◽

Author(s):

Nan Yuan ◽

Yongbin Zhou ◽

Guangming Tan ◽

Junchao Zhang ◽

Dongrui Fan

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Performance Matrix

Download Full-text

A Family of High-Performance Matrix Multiplication Algorithms

Computational Science — ICCS 2001 - Lecture Notes in Computer Science ◽

10.1007/3-540-45545-0_15 ◽

2001 ◽

pp. 51-60 ◽

Author(s):

John A. Gunnels ◽

Greg M. Henry ◽

Robert A. van de Geijn

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Performance Matrix

Download Full-text

Fault-tolerant high-performance matrix multiplication: theory and practice

Proceedings International Conference on Dependable Systems and Networks ◽

10.1109/dsn.2001.941390 ◽

2002 ◽

Author(s):

J.A. Gunnels ◽

D.S. Katz ◽

E.S. Quintana-Orti ◽

R.A. Van de Gejin

Keyword(s):

High Performance ◽

Fault Tolerant ◽

Matrix Multiplication ◽

Theory And Practice ◽

Performance Matrix

Download Full-text

A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

IBM Journal of Research and Development ◽

10.1147/rd.386.0673 ◽

1994 ◽

Vol 38 (6) ◽

pp. 673-681 ◽

Author(s):

R. C. Agarwal ◽

F. G. Gustavson ◽

M. Zubair

Keyword(s):

High Performance ◽

Distributed Memory ◽

Matrix Multiplication ◽

Parallel Computer ◽

Matrix Multiplication Algorithm ◽

Multiplication Algorithm ◽

Performance Matrix

Download Full-text

Customizable and High Performance Matrix Multiplication Kernel on FPGA (Abstract Only)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '15 ◽

10.1145/2684746.2689147 ◽

2015 ◽

Author(s):

Jie Wang ◽

Jason Cong

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Performance Matrix

Download Full-text

High-Performance Implementation

Multidimensional Programming ◽

10.1093/oso/9780195075977.003.0010 ◽

1995 ◽

Author(s):

E. A. Ashcroft ◽

A. A. Faustini ◽

R. Jaggannathan ◽

W. W. Wadge

Keyword(s):

Message Passing ◽

High Performance ◽

Interconnection Network ◽

Matrix Multiplication ◽

Parallel Computers ◽

Parallel Applications ◽

Low Level ◽

Two Factors ◽

Hybrid Language ◽

Memory Architectures

In Chapter 1, we saw how Lucid could be used to express solutions to standard problems such as sorting and matrix multiplication. One of the unique characteristics of Lucid is not only that it can be used as a programming language but it can also be used as a “composition” language. That is, instead of using Lucid to specify computations, it can be used to express how computation components (expressed in some other language) can be “glued” together to form a coherent application. By doing so, the resulting application can enjoy some of the practical benefits attributable to Lucid such as high performance through exploitation of implicit parallelism and robustness through software fault tolerance. In this chapter, we discuss one such use of Lucid—as part of a hybrid language to construct parallel applications to be executed on conventional parallel computers. A conventional parallel computer either consists of a number of processors each with local memory interconnected by a network (as in distributed memory architectures) or a number of processors that share memory possibly using an interconnection network (as in shared memory architectures). The past decade has seen the advent of conventional parallel computers starting with the Denelcor HEP evolving to the CM-2 and Intel Hypercube and further evolving to the CM-5, Intel Paragon, Cray T3D, and IBM SP-2. Even networks of workstations (or workstation clusters) are seen as low-cost (“poor man’s”) parallel computers. Programming of conventional parallel computers has proven to be far more challenging than had been expected. Part of the reason is the continued use of low-level, explicitly parallel programming models such as PVM [42], Linda [10]. Two factors have fueled the continuing use of such languages despite their limited success. 1. The need to reuse existing sequential code because the cost of rewriting legacy applications from scratch is considered prohibitive both in economic and technical terms. 2. The need to run on conventional parallel computers that view a “parallel program” at a low level—as consisting of sequential processes that frequently synchronize and communicate with each other using some form of message passing.

Download Full-text

Double Block Data Layout in High Performance Matrix Multiplication Algorithm

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.7.132-139 ◽

2016 ◽

Vol 7 (2) ◽

pp. 132-139 ◽

Author(s):

M. V. Yurushkin ◽

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Data Layout ◽

Matrix Multiplication Algorithm ◽

Multiplication Algorithm ◽

Performance Matrix ◽

Download Full-text