Approximate vector matrix multiplication implementations for neuromorphic applications using memristive crossbars

2010 ◽

Vol 20 (02) ◽

pp. 103-121 ◽

Cited By ~ 1

Author(s):

MOSTAFA I. SOLIMAN ◽

ABDULMAJID F. Al-JUNAID

Keyword(s):

Performance Evaluation ◽

Matrix Multiplication ◽

General Purpose ◽

System Level ◽

Memory Latency ◽

Single Chip ◽

Wide Range ◽

Matrix Unit ◽

And Performance ◽

Vector Matrix

Technological advances in IC manufacturing provide us with the capability to integrate more and more functionality into a single chip. Today's modern processors have nearly one billion transistors on a single chip. With the increasing complexity of today's system, the designs have to be modeled at a high-level of abstraction before partitioning into hardware and software components for final implementation. This paper explains in detail the implementation and performance evaluation of a matrix processor called Mat-Core with SystemC (system level modeling language). Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute matrix-scalar, matrix-vector, and matrix-matrix instructions in addition to vector-scalar and vector-vector instructions. For controlling the execution of vector/matrix instructions on the matrix core, this paper extends the well known scoreboard technique. Furthermore, the performance of Mat-Core is evaluated on vector and matrix kernels. Our results show that the performance of four lanes Mat-Core with matrix registers of size 4 × 4 or 16 elements each, queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles is about 0.94, 1.3, 2.3, 1.6, 2.3, and 5.5 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.

Download Full-text

Three-Dimensional nand Flash for Vector–Matrix Multiplication

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2018.2882194 ◽

2019 ◽

Vol 27 (4) ◽

pp. 988-991 ◽

Cited By ~ 17

Author(s):

Panni Wang ◽

Feng Xu ◽

Bo Wang ◽

Bin Gao ◽

Huaqiang Wu ◽

...

Keyword(s):

Matrix Multiplication ◽

Three Dimensional ◽

Nand Flash ◽

Vector Matrix

Download Full-text

A Flexible Precision Multi-Format In-Memory Vector Matrix Multiplication Engine in 65 nm CMOS With RF Machine Learning Support

IEEE Solid-State Circuits Letters ◽

10.1109/lssc.2020.3023703 ◽

2020 ◽

Vol 3 ◽

pp. 450-453

Author(s):

Mandovi Mukherjee ◽

Yun Long ◽

Jongseok Woo ◽

Daehyun Kim ◽

Nael Mizanur Rahman ◽

...

Keyword(s):

Machine Learning ◽

Matrix Multiplication ◽

Learning Support ◽

Vector Matrix

Download Full-text

Carry-free vector-matrix multiplication on a dynamically reconfigurable optical platform

Applied Optics ◽

10.1364/ao.49.002352 ◽

2010 ◽

Vol 49 (12) ◽

pp. 2352 ◽

Cited By ~ 26

Author(s):

Xianchao Wang ◽

Junjie Peng ◽

Mei Li ◽

Zhangyi Shen ◽

Ouyang Shan

Keyword(s):

Matrix Multiplication ◽

Dynamically Reconfigurable ◽

Vector Matrix

Download Full-text

A Novel Vector-matrix Multiplication (VMM) Architecture based on NAND Memory Array

JSTS Journal of Semiconductor Technology and Science ◽

10.5573/jsts.2020.20.3.242 ◽

2020 ◽

Vol 20 (3) ◽

pp. 242-248

Author(s):

Suhyeon Kim ◽

Myung-Hyun Baek ◽

Sungmin Hwang ◽

Taejin Jang ◽

Kyungchul Park ◽

...

Keyword(s):

Matrix Multiplication ◽

Memory Array ◽

Vector Matrix

Download Full-text

Mitigating State-Drift in Memristor Crossbar Arrays for Vector Matrix Multiplication

10.5772/intechopen.100246 ◽

2021 ◽

Author(s):

Amirali Amirsoleimani ◽

Tony Liu ◽

Fabien Alibart ◽

Serge Eccofey ◽

Yao-Feng Chang ◽

...

Keyword(s):

Matrix Multiplication ◽

Optimization Techniques ◽

Performance Improvements ◽

Network Applications ◽

Network Layers ◽

Adaptive Inference ◽

Computing Platforms ◽

And Performance ◽

Memristor Crossbar ◽

Vector Matrix

In this Chapter, we review the recent progress on resistance drift mitigation techniques for resistive switching memory devices (specifically memristors) and its impact on the accuracy in deep neural network applications. In the first section of the chapter, we investigate the importance of soft errors and their detrimental impact on memristor-based vector–matrix multiplication (VMM) platforms performance specially the memristance state-drift induced by long-term recurring inference operations with sub-threshold stress voltage. Also, we briefly review some currently developed state-drift mitigation methods. In the next section of the chapter, we will discuss an adaptive inference technique with low hardware overhead to mitigate the memristance drift in memristive VMM platform by using optimization techniques to adjust the inference voltage characteristic associated with different network layers. Also, we present simulation results and performance improvements achieved by applying the proposed inference technique by considering non-idealities for various deep network applications on memristor crossbar arrays. This chapter suggests that a simple low overhead inference technique can revive the functionality, enhance the performance of memristor-based VMM arrays and significantly increases their lifetime which can be a very important factor toward making this technology as a main stream player in future in-memory computing platforms.

Download Full-text