SYSTEMC IMPLEMENTATION AND PERFORMANCE EVALUATION OF A DECOUPLED GENERAL-PURPOSE MATRIX PROCESSOR

Technological advances in IC manufacturing provide us with the capability to integrate more and more functionality into a single chip. Today's modern processors have nearly one billion transistors on a single chip. With the increasing complexity of today's system, the designs have to be modeled at a high-level of abstraction before partitioning into hardware and software components for final implementation. This paper explains in detail the implementation and performance evaluation of a matrix processor called Mat-Core with SystemC (system level modeling language). Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute matrix-scalar, matrix-vector, and matrix-matrix instructions in addition to vector-scalar and vector-vector instructions. For controlling the execution of vector/matrix instructions on the matrix core, this paper extends the well known scoreboard technique. Furthermore, the performance of Mat-Core is evaluated on vector and matrix kernels. Our results show that the performance of four lanes Mat-Core with matrix registers of size 4 × 4 or 16 elements each, queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles is about 0.94, 1.3, 2.3, 1.6, 2.3, and 5.5 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.

Download Full-text

System-level Simulation and Performance Evaluation

Nano- and Microscience, Engineering, Technology and Medicine - Microelectrofluidic Systems ◽

10.1201/9781420040494.ch4 ◽

2002 ◽

Keyword(s):

Performance Evaluation ◽

System Level ◽

System Level Simulation ◽

And Performance

Download Full-text

Multi-Softcore Architecture on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2014/979327 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Mouna Baklouti ◽

Mohamed Abid

Keyword(s):

High Performance ◽

Design Methodology ◽

Matrix Multiplication ◽

Rapid Prototype ◽

General Purpose ◽

Parallel Applications ◽

Multicore Systems ◽

Processor Core ◽

Nios Ii ◽

Wide Range

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication).

Download Full-text

Implementation and performance evaluation of 256-QAM in vienna system level simulator

2018 20th International Conference on Advanced Communication Technology (ICACT) ◽

10.23919/icact.2018.8323830 ◽

2018 ◽

Cited By ~ 1

Author(s):

Igor Kim ◽

Jungsun Um ◽

Seungkeun Park

Keyword(s):

Performance Evaluation ◽

System Level ◽

And Performance ◽

Vienna System

Download Full-text

Mitigating State-Drift in Memristor Crossbar Arrays for Vector Matrix Multiplication

10.5772/intechopen.100246 ◽

2021 ◽

Author(s):

Amirali Amirsoleimani ◽

Tony Liu ◽

Fabien Alibart ◽

Serge Eccofey ◽

Yao-Feng Chang ◽

...

Keyword(s):

Matrix Multiplication ◽

Optimization Techniques ◽

Performance Improvements ◽

Network Applications ◽

Network Layers ◽

Adaptive Inference ◽

Computing Platforms ◽

And Performance ◽

Memristor Crossbar ◽

Vector Matrix

In this Chapter, we review the recent progress on resistance drift mitigation techniques for resistive switching memory devices (specifically memristors) and its impact on the accuracy in deep neural network applications. In the first section of the chapter, we investigate the importance of soft errors and their detrimental impact on memristor-based vector–matrix multiplication (VMM) platforms performance specially the memristance state-drift induced by long-term recurring inference operations with sub-threshold stress voltage. Also, we briefly review some currently developed state-drift mitigation methods. In the next section of the chapter, we will discuss an adaptive inference technique with low hardware overhead to mitigate the memristance drift in memristive VMM platform by using optimization techniques to adjust the inference voltage characteristic associated with different network layers. Also, we present simulation results and performance improvements achieved by applying the proposed inference technique by considering non-idealities for various deep network applications on memristor crossbar arrays. This chapter suggests that a simple low overhead inference technique can revive the functionality, enhance the performance of memristor-based VMM arrays and significantly increases their lifetime which can be a very important factor toward making this technology as a main stream player in future in-memory computing platforms.

Download Full-text

Study on Dense Matrix Multiplication Algorithms and Performance Evaluation of HPCC in 81 Nodes IBM Power 8 Architecture

10.9734/bpi/ramrcs/v5/14371d ◽

2021 ◽

pp. 105-125

Author(s):

Eduardo Patricio Estévez Ruiz ◽

Giovanny Eduardo Caluña Chicaiza ◽

Fabian Rodolfo Jiménez Patiño ◽

Joaquín Cayetano López Lago ◽

Saravana Prakash Thirumuruganandham

Keyword(s):

Performance Evaluation ◽

Matrix Multiplication ◽

Dense Matrix ◽

And Performance

Download Full-text

Cycle-Accurate System-Level Modeling and Performance Evaluation

Industrial Information Technology - EDA for IC System Design, Verification, and Testing ◽

10.1201/9781420007947.sec3 ◽

2006 ◽

pp. 12-1-12-18

Author(s):

Marcello Coppola ◽

Miltos Grammatikakis

Keyword(s):

Performance Evaluation ◽

System Level ◽

System Level Modeling ◽

And Performance

Download Full-text

Cycle-Accurate System-Level Modeling and Performance Evaluation

EDA for IC System Design, Verification, and Testing ◽

10.1201/9781420007947-12 ◽

2018 ◽

pp. 12-1-12-18

Keyword(s):

Performance Evaluation ◽

System Level ◽

System Level Modeling ◽

And Performance

Download Full-text

System-Level Modeling and Performance Evaluation of Multi-Hop 802.16j Systems

2008 International Wireless Communications and Mobile Computing Conference ◽

10.1109/iwcmc.2008.62 ◽

2008 ◽

Cited By ~ 25

Author(s):

Hui Zeng ◽

Chenxi Zhu

Keyword(s):

Performance Evaluation ◽

System Level ◽

System Level Modeling ◽

And Performance

Download Full-text

A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs

Journal of Circuits System and Computers ◽

10.1142/s0218126614300025 ◽

2014 ◽

Vol 23 (08) ◽

pp. 1430002 ◽

Cited By ~ 11

Author(s):

SPARSH MITTAL

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

General Purpose ◽

System Level ◽

Cache Management ◽

Full Potential ◽

Wide Range ◽

Computing Platforms ◽

Graphics Processing

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.

Download Full-text