A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

2002 ◽  
Vol 14 (10) ◽  
pp. 805-839 ◽  
Author(s):  
Vinod Valsalam ◽  
Anthony Skjellum
2008 ◽  
Vol 34 (3) ◽  
pp. 1-25 ◽  
Author(s):  
Kazushige Goto ◽  
Robert A. van de Geijn

2016 ◽  
Vol 72 (3) ◽  
pp. 804-844 ◽  
Author(s):  
Vasilios Kelefouras ◽  
A. Kritikakou ◽  
Iosif Mporas ◽  
Vasilios Kolonias

Author(s):  
Nan Yuan ◽  
Yongbin Zhou ◽  
Guangming Tan ◽  
Junchao Zhang ◽  
Dongrui Fan

Author(s):  
E. A. Ashcroft ◽  
A. A. Faustini ◽  
R. Jaggannathan ◽  
W. W. Wadge

In Chapter 1, we saw how Lucid could be used to express solutions to standard problems such as sorting and matrix multiplication. One of the unique characteristics of Lucid is not only that it can be used as a programming language but it can also be used as a “composition” language. That is, instead of using Lucid to specify computations, it can be used to express how computation components (expressed in some other language) can be “glued” together to form a coherent application. By doing so, the resulting application can enjoy some of the practical benefits attributable to Lucid such as high performance through exploitation of implicit parallelism and robustness through software fault tolerance. In this chapter, we discuss one such use of Lucid—as part of a hybrid language to construct parallel applications to be executed on conventional parallel computers. A conventional parallel computer either consists of a number of processors each with local memory interconnected by a network (as in distributed memory architectures) or a number of processors that share memory possibly using an interconnection network (as in shared memory architectures). The past decade has seen the advent of conventional parallel computers starting with the Denelcor HEP evolving to the CM-2 and Intel Hypercube and further evolving to the CM-5, Intel Paragon, Cray T3D, and IBM SP-2. Even networks of workstations (or workstation clusters) are seen as low-cost (“poor man’s”) parallel computers. Programming of conventional parallel computers has proven to be far more challenging than had been expected. Part of the reason is the continued use of low-level, explicitly parallel programming models such as PVM [42], Linda [10]. Two factors have fueled the continuing use of such languages despite their limited success. 1. The need to reuse existing sequential code because the cost of rewriting legacy applications from scratch is considered prohibitive both in economic and technical terms. 2. The need to run on conventional parallel computers that view a “parallel program” at a low level—as consisting of sequential processes that frequently synchronize and communicate with each other using some form of message passing.


Sign in / Sign up

Export Citation Format

Share Document