Dense Matrix Multiplication Algorithms and Performance Evaluation of HPCC in 81 Nodes IBM Power 8 Architecture

Eduardo Patricio Estévez Estévez Ruiz; Giovanny Eduardo Caluña Caluña Chicaiza; Fabian Rodolfo Jiménez Patiño; Joaquín Cayetano López López Lago; Saravana Prakash Thirumuruganandham

doi:10.3390/computation9080086

Dense Matrix Multiplication Algorithms and Performance Evaluation of HPCC in 81 Nodes IBM Power 8 Architecture

Computation ◽

10.3390/computation9080086 ◽

2021 ◽

Vol 9 (8) ◽

pp. 86

Author(s):

Eduardo Patricio Estévez Estévez Ruiz ◽

Giovanny Eduardo Caluña Caluña Chicaiza ◽

Fabian Rodolfo Jiménez Patiño ◽

Joaquín Cayetano López López Lago ◽

Saravana Prakash Thirumuruganandham

Keyword(s):

Performance Evaluation ◽

System Performance ◽

High Performance ◽

Matrix Multiplication ◽

Dense Matrix ◽

Current Configuration ◽

Performance Factors ◽

Reasonable Cost ◽

And Performance ◽

Performance Computing

Optimizing HPC systems based on performance factors and bottlenecks is essential for designing an HPC infrastructure with the best characteristics and at a reasonable cost. Such insight can only be achieved through a detailed analysis of existing HPC systems and the execution of their workloads. The “Quinde I” is the only and most powerful supercomputer in Ecuador and is currently listed third on the South America. It was built with the IBM Power 8 servers. In this work, we measured its performance using different parameters from High-Performance Computing (HPC) to compare it with theoretical values and values obtained from tests on similar models. To measure its performance, we compiled and ran different benchmarks with the specific optimization flags for Power 8 to get the maximum performance with the current configuration in the hardware installed by the vendor. The inputs of the benchmarks were varied to analyze their impact on the system performance. In addition, we compile and compare the performance of two algorithms for dense matrix multiplication SRUMMA and DGEMM.

Study on Dense Matrix Multiplication Algorithms and Performance Evaluation of HPCC in 81 Nodes IBM Power 8 Architecture

10.9734/bpi/ramrcs/v5/14371d ◽

2021 ◽

pp. 105-125

Author(s):

Eduardo Patricio Estévez Ruiz ◽

Giovanny Eduardo Caluña Chicaiza ◽

Fabian Rodolfo Jiménez Patiño ◽

Joaquín Cayetano López Lago ◽

Saravana Prakash Thirumuruganandham

Keyword(s):

Performance Evaluation ◽

Matrix Multiplication ◽

Dense Matrix ◽

And Performance

RAPID for high-performance computing systems: architecture and performance evaluation

Applied Optics ◽

10.1364/ao.45.006326 ◽

2006 ◽

Vol 45 (25) ◽

pp. 6326 ◽

Cited By ~ 7

Author(s):

Avinash Karanth Kodi ◽

Ahmed Louri

Keyword(s):

Performance Evaluation ◽

High Performance Computing ◽

High Performance ◽

Computing Systems ◽

Systems Architecture ◽

And Performance ◽

Performance Computing

Simulator considering modeling and performance evaluation for high-performance computing of collaborative-based mobile cloud infrastructure

The Journal of Supercomputing ◽

10.1007/s11227-019-02882-x ◽

2019 ◽

Vol 75 (8) ◽

pp. 4459-4471 ◽

Cited By ~ 1

Author(s):

Hyun-Woo Kim ◽

Jungho Kang ◽

Young-Sik Jeong

Keyword(s):

Performance Evaluation ◽

High Performance Computing ◽

High Performance ◽

Mobile Cloud ◽

Cloud Infrastructure ◽

And Performance ◽

Performance Computing

Performance Evaluation of Container-Based Virtualization for High Performance Computing Environments

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing ◽

10.1109/pdp.2013.41 ◽

2013 ◽

Cited By ~ 197

Author(s):

M. G. Xavier ◽

M. V. Neves ◽

F. D. Rossi ◽

T. C. Ferreto ◽

T. Lange ◽

...

Keyword(s):

Performance Evaluation ◽

High Performance Computing ◽

High Performance ◽

Computing Environments ◽

Performance Computing

Self-assembly of porphyrin on the surface of a novel composite high performance photocatalyst for the degradation of organic dye from water: Characterization and performance evaluation

Journal of Environmental Chemical Engineering ◽

10.1016/j.jece.2021.106034 ◽

2021 ◽

pp. 106034

Author(s):

Duong Duc La ◽

Tuan Anh Nguyen ◽

X. Sang Nguyen ◽

Tuan N. Truong ◽

H. Phuong Nguyen T. ◽

...

Keyword(s):

Performance Evaluation ◽

Self Assembly ◽

High Performance ◽

Organic Dye ◽

And Performance

SYSTEMC IMPLEMENTATION AND PERFORMANCE EVALUATION OF A DECOUPLED GENERAL-PURPOSE MATRIX PROCESSOR

Parallel Processing Letters ◽

10.1142/s0129626410000090 ◽

2010 ◽

Vol 20 (02) ◽

pp. 103-121 ◽

Cited By ~ 1

Author(s):

MOSTAFA I. SOLIMAN ◽

ABDULMAJID F. Al-JUNAID

Keyword(s):

Performance Evaluation ◽

Matrix Multiplication ◽

General Purpose ◽

System Level ◽

Memory Latency ◽

Single Chip ◽

Wide Range ◽

Matrix Unit ◽

And Performance ◽

Vector Matrix

Technological advances in IC manufacturing provide us with the capability to integrate more and more functionality into a single chip. Today's modern processors have nearly one billion transistors on a single chip. With the increasing complexity of today's system, the designs have to be modeled at a high-level of abstraction before partitioning into hardware and software components for final implementation. This paper explains in detail the implementation and performance evaluation of a matrix processor called Mat-Core with SystemC (system level modeling language). Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute matrix-scalar, matrix-vector, and matrix-matrix instructions in addition to vector-scalar and vector-vector instructions. For controlling the execution of vector/matrix instructions on the matrix core, this paper extends the well known scoreboard technique. Furthermore, the performance of Mat-Core is evaluated on vector and matrix kernels. Our results show that the performance of four lanes Mat-Core with matrix registers of size 4 × 4 or 16 elements each, queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles is about 0.94, 1.3, 2.3, 1.6, 2.3, and 5.5 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.

High-Performance Workwear for Coal Miners in Northern China: Design and Performance Evaluation

Autex Research Journal ◽

10.2478/aut-2021-0020 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Ying Ke ◽

Qing Zheng ◽

Faming Wang ◽

Min Wang ◽

Yi Wang

Keyword(s):

Performance Evaluation ◽

Thermal Comfort ◽

High Performance ◽

Northern China ◽

Coal Miners ◽

Climate Chamber ◽

Local Skin ◽

Worker Performance ◽

And Performance ◽

Skin Temperatures

Abstract The design of workwear has significant effects on worker performance. However, the current workwear for coal miners in Northern China is poor in fitness and thermal comfort. In this study, new workwear (NEW) for coal miners was developed with the design features providing better cold protection and movement comfort performance, as compared with a commonly worn workwear (CON). To evaluate the effectiveness of NEW, we conducted human trials which were performed using simulated work movements (i.e., sitting, shoveling, squatting, and crawling) in a climate chamber (10°C, 75% RH). Physiological measurements and perceptual responses were obtained. The results demonstrated that the local skin temperatures at chest, scapula, thigh, and calf; mean skin temperatures,; and thermal comfort in NEW were significantly higher than those in CON. NEW also exerted an improvement in enhancing movement comfort. We conclude that NEW could meet well with the cold protective and mobility requirements.

SeisNoise.jl: Ambient Seismic Noise Cross Correlation on the CPU and GPU in Julia

Seismological Research Letters ◽

10.1785/0220200192 ◽

2020 ◽

Vol 92 (1) ◽

pp. 517-527

Author(s):

Timothy Clements ◽

Marine A. Denolle

Keyword(s):

Seismic Noise ◽

High Performance ◽

Cross Correlation ◽

Graphic Processing Unit ◽

Ambient Seismic Noise ◽

Processing Unit ◽

Central Processing ◽

And Performance ◽

Noise Cross Correlation ◽

Performance Computing

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB (see Data and Resources), R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources, using both the central processing unit and the graphic processing unit. SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.

Performance evaluation of Amazon Elastic Compute Cloud for NASA high-performance computing applications

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.3029 ◽

2013 ◽

Vol 28 (4) ◽

pp. 1041-1055 ◽

Cited By ~ 9

Author(s):

Piyush Mehrotra ◽

Jahed Djomehri ◽

Steve Heistand ◽

Robert Hood ◽

Haoqiang Jin ◽

...

Keyword(s):

Performance Evaluation ◽

High Performance Computing ◽

High Performance ◽

Performance Computing

AN ASSOCIATIVE DATA PARALLEL COMPILATION MODEL FOR TIGHT INTEGRATION OF HIGH PERFORMANCE KNOWLEDGE RETRIEVAL AND COMPUTING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213094000078 ◽

1994 ◽

Vol 03 (01) ◽

pp. 97-125 ◽

Cited By ~ 3

Author(s):

ARVIND K. BANSAL

Keyword(s):

Performance Evaluation ◽

High Performance ◽

Loose Coupling ◽

Abstract Machine ◽

Data Movement ◽

Left Hand ◽

Low Level ◽

Data Parallel ◽

Data Alignment ◽

And Performance

Associative Computation is characterized by intertwining of search by content and data parallel computation. An algebra for associative computation is described. A compilation based model and a novel abstract machine for associative logic programming are presented. The model uses loose coupling of left hand side of the program, treated as data, and right hand side of the program, treated as low level code. This representation achieves efficiency by associative computation and data alignment during goal reduction and during execution of low level abstract instructions. Data alignment reduces the overhead of data movement. Novel schemes for associative manipulation of aliased uninstantiated variables, data parallel goal reduction in the presence multiple occurrences of the same variables in a goal. The architecture, behavior, and performance evaluation of the model are presented.