scholarly journals Mitigating State-Drift in Memristor Crossbar Arrays for Vector Matrix Multiplication

2021 ◽  
Author(s):  
Amirali Amirsoleimani ◽  
Tony Liu ◽  
Fabien Alibart ◽  
Serge Eccofey ◽  
Yao-Feng Chang ◽  
...  

In this Chapter, we review the recent progress on resistance drift mitigation techniques for resistive switching memory devices (specifically memristors) and its impact on the accuracy in deep neural network applications. In the first section of the chapter, we investigate the importance of soft errors and their detrimental impact on memristor-based vector–matrix multiplication (VMM) platforms performance specially the memristance state-drift induced by long-term recurring inference operations with sub-threshold stress voltage. Also, we briefly review some currently developed state-drift mitigation methods. In the next section of the chapter, we will discuss an adaptive inference technique with low hardware overhead to mitigate the memristance drift in memristive VMM platform by using optimization techniques to adjust the inference voltage characteristic associated with different network layers. Also, we present simulation results and performance improvements achieved by applying the proposed inference technique by considering non-idealities for various deep network applications on memristor crossbar arrays. This chapter suggests that a simple low overhead inference technique can revive the functionality, enhance the performance of memristor-based VMM arrays and significantly increases their lifetime which can be a very important factor toward making this technology as a main stream player in future in-memory computing platforms.

2010 ◽  
Vol 20 (02) ◽  
pp. 103-121 ◽  
Author(s):  
MOSTAFA I. SOLIMAN ◽  
ABDULMAJID F. Al-JUNAID

Technological advances in IC manufacturing provide us with the capability to integrate more and more functionality into a single chip. Today's modern processors have nearly one billion transistors on a single chip. With the increasing complexity of today's system, the designs have to be modeled at a high-level of abstraction before partitioning into hardware and software components for final implementation. This paper explains in detail the implementation and performance evaluation of a matrix processor called Mat-Core with SystemC (system level modeling language). Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute matrix-scalar, matrix-vector, and matrix-matrix instructions in addition to vector-scalar and vector-vector instructions. For controlling the execution of vector/matrix instructions on the matrix core, this paper extends the well known scoreboard technique. Furthermore, the performance of Mat-Core is evaluated on vector and matrix kernels. Our results show that the performance of four lanes Mat-Core with matrix registers of size 4 × 4 or 16 elements each, queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles is about 0.94, 1.3, 2.3, 1.6, 2.3, and 5.5 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.


MRS Bulletin ◽  
1997 ◽  
Vol 22 (10) ◽  
pp. 49-54 ◽  
Author(s):  
E. Todd Ryan ◽  
Andrew J. McKerrow ◽  
Jihperng Leu ◽  
Paul S. Ho

Continuing improvement in device density and performance has significantly affected the dimensions and complexity of the wiring structure for on-chip interconnects. These enhancements have led to a reduction in the wiring pitch and an increase in the number of wiring levels to fulfill demands for density and performance improvements. As device dimensions shrink to less than 0.25 μm, the propagation delay, crosstalk noise, and power dissipation due to resistance-capacitance (RC) coupling become significant. Accordingly the interconnect delay now constitutes a major fraction of the total delay limiting the overall chip performance. Equally important is the processing complexity due to an increase in the number of wiring levels. This inevitably drives cost up by lowering the manufacturing yield due to an increase in defects and processing complexity.To address these problems, new materials for use as metal lines and interlayer dielectrics (ILDs) and alternative architectures have surfaced to replace the current Al(Cu)/SiO2 interconnect technology. These alternative architectures will require the introduction of low-dielectric-constant k materials as the interlayer dielectrics and/or low-resistivity conductors such as copper. The electrical and thermomechanical properties of SiO2 are ideal for ILD applications, and a change to material with different properties has important process-integration implications. To facilitate the choice of an alternative ILD, it is necessary to establish general criterion for evaluating thin-film properties of candidate low-k materials, which can be later correlated with process-integration problems.


Author(s):  
Xiaomo Jiang ◽  
Craig Foster

Gas turbine simple or combined cycle plants are built and operated with higher availability, reliability, and performance in order to provide the customer with sufficient operating revenues and reduced fuel costs meanwhile enhancing customer dispatch competitiveness. A tremendous amount of operational data is usually collected from the everyday operation of a power plant. It has become an increasingly important but challenging issue about how to turn this data into knowledge and further solutions via developing advanced state-of-the-art analytics. This paper presents an integrated system and methodology to pursue this purpose by automating multi-level, multi-paradigm, multi-facet performance monitoring and anomaly detection for heavy duty gas turbines. The system provides an intelligent platform to drive site-specific performance improvements, mitigate outage risk, rationalize operational pattern, and enhance maintenance schedule and service offerings via taking appropriate proactive actions. In addition, the paper also presents the components in the system, including data sensing, hardware, and operational anomaly detection, expertise proactive act of company, site specific degradation assessment, and water wash effectiveness monitoring and analytics. As demonstrated in two examples, this remote performance monitoring aims to improve equipment efficiency by converting data into knowledge and solutions in order to drive value for customers including lowering operating fuel cost and increasing customer power sales and life cycle value.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 253
Author(s):  
Yosang Jeong ◽  
Hoon Ryu

The non-equilibrium Green’s function (NEGF) is being utilized in the field of nanoscience to predict transport behaviors of electronic devices. This work explores how much performance improvement can be driven for quantum transport simulations with the aid of manycore computing, where the core numerical operation involves a recursive process of matrix multiplication. Major techniques adopted for performance enhancement are data restructuring, matrix tiling, thread scheduling, and offload computing, and we present technical details on how they are applied to optimize the performance of simulations in computing hardware, including Intel Xeon Phi Knights Landing (KNL) systems and NVIDIA general purpose graphic processing unit (GPU) devices. With a target structure of a silicon nanowire that consists of 100,000 atoms and is described with an atomistic tight-binding model, the effects of optimization techniques on the performance of simulations are rigorously tested in a KNL node equipped with two Quadro GV100 GPU devices, and we observe that computation is accelerated by a factor of up to ∼20 against the unoptimized case. The feasibility of handling large-scale workloads in a huge computing environment is also examined with nanowire simulations in a wide energy range, where good scalability is procured up to 2048 KNL nodes.


AIHA Journal ◽  
2003 ◽  
Vol 64 (5) ◽  
pp. 660-667 ◽  
Author(s):  
Katharyn A. Grant ◽  
John G. Garland ◽  
Todd C. Joachim ◽  
Andrew Wallen ◽  
Twyla Vital

Author(s):  
K. Boddenberg ◽  
B. Kock ◽  
M. Dorfman ◽  
L. Russo ◽  
M. Nestler

Abstract Air separation plants use centrifugal compressors where air and electrical energy are the only raw materials used in the production process. So energy costs play a crucial role and the compressors are heavily penalized when guaranteed performance levels are not achieved. In order to better generate performance, abradable coatings, previously used in the gas turbine industry, have been designed into turbocompressors. This paper will show the optimization and performance improvements of a new aluminium silicon-boron nitride material.


Sign in / Sign up

Export Citation Format

Share Document