scholarly journals Memory Access Optimization of a Neural Network Accelerator Based on Memory Controller

Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 438
Author(s):  
Rongshan Wei ◽  
Chenjia Li ◽  
Chuandong Chen ◽  
Guangyu Sun ◽  
Minghua He

Special accelerator architecture has achieved great success in processor architecture, and it is trending in computer architecture development. However, as the memory access pattern of an accelerator is relatively complicated, the memory access performance is relatively poor, limiting the overall performance improvement of hardware accelerators. Moreover, memory controllers for hardware accelerators have been scarcely researched. We consider that a special accelerator memory controller is essential for improving the memory access performance. To this end, we propose a dynamic random access memory (DRAM) memory controller called NNAMC for neural network accelerators, which monitors the memory access stream of an accelerator and transfers it to the optimal address mapping scheme bank based on the memory access characteristics. NNAMC includes a stream access prediction unit (SAPU) that analyzes the type of data stream accessed by the accelerator via hardware, and designs the address mapping for different banks using a bank partitioning model (BPM). The image mapping method and hardware architecture were analyzed in a practical neural network accelerator. In the experiment, NNAMC achieved significantly lower access latency of the hardware accelerator than the competing address mapping schemes, increased the row buffer hit ratio by 13.68% on average (up to 26.17%), reduced the system access latency by 26.3% on average (up to 37.68%), and lowered the hardware cost. In addition, we also confirmed that NNAMC efficiently adapted to different network parameters.

2018 ◽  
Vol 27 (08) ◽  
pp. 1850126
Author(s):  
Dong-Ik Jeon ◽  
Min-Kyu Lee ◽  
Ji-Chan Kim ◽  
Ki-Seok Chung

The main memory system has become crucial not only because it has to meet an increasing bandwidth requirement, but also because it has to seamlessly support many concurrently executing applications. In order to improve memory performance, a memory controller with efficient arbitration is necessary. It is well known that memory performance is dependent on the memory access patterns. The offline performance analysis has difficulty analyzing the Dynamic Random Access Memory (DRAM) performance accurately because a huge set of trace patterns is needed. This paper proposes a novel profiler that is synthesized with a memory controller in order to monitor and analyze the memory controller performance at runtime. In this paper, five key metrics for performance evaluation are defined and they are monitored and evaluated at runtime by the proposed profiler. A prototype system with a processor core, a memory controller, DRAM modules, and peripheral devices are implemented on a field-programmable gate array (FPGA) board to carry out the experiments. It has been observed that the worst latency overhead differs for each benchmark. In addition, a new overall overhead estimation method is proposed to estimate the memory access latency overhead in time, and this method can be used to evaluate the performance of a certain memory arbitration method depending on running applications.


2013 ◽  
Vol 41 (3) ◽  
pp. 380-391 ◽  
Author(s):  
Young Hoon Son ◽  
O. Seongil ◽  
Yuhwan Ro ◽  
Jae W. Lee ◽  
Jung Ho Ahn
Keyword(s):  

Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1454
Author(s):  
Yoshihiro Sugiura ◽  
Toru Tanzawa

This paper describes how one can reduce the memory access time with pre-emphasis (PE) pulses even in non-volatile random-access memory. Optimum PE pulse widths and resultant minimum word-line (WL) delay times are investigated as a function of column address. The impact of the process variation in the time constant of WL, the cell current, and the resistance of deciding path on optimum PE pulses are discussed. Optimum PE pulse widths and resultant minimum WL delay times are modeled with fitting curves as a function of column address of the accessed memory cell, which provides designers with the ability to set the optimum timing for WL and BL (bit-line) operations, reducing average memory access time.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 816
Author(s):  
Pingping Liu ◽  
Xiaokang Yang ◽  
Baixin Jin ◽  
Qiuzhan Zhou

Diabetic retinopathy (DR) is a common complication of diabetes mellitus (DM), and it is necessary to diagnose DR in the early stages of treatment. With the rapid development of convolutional neural networks in the field of image processing, deep learning methods have achieved great success in the field of medical image processing. Various medical lesion detection systems have been proposed to detect fundus lesions. At present, in the image classification process of diabetic retinopathy, the fine-grained properties of the diseased image are ignored and most of the retinopathy image data sets have serious uneven distribution problems, which limits the ability of the network to predict the classification of lesions to a large extent. We propose a new non-homologous bilinear pooling convolutional neural network model and combine it with the attention mechanism to further improve the network’s ability to extract specific features of the image. The experimental results show that, compared with the most popular fundus image classification models, the network model we proposed can greatly improve the prediction accuracy of the network while maintaining computational efficiency.


2012 ◽  
Vol 198-199 ◽  
pp. 523-527
Author(s):  
Fang Yuan Chen ◽  
Dong Song Zhang ◽  
Zhi Ying Wang

Worst-Case Execution Time (WCET) is crucial in real-time systems and is very challenging in multicore processors due to the possible runtime inter-thread interferences caused by shared resources. This paper proposes a novel approach to analyze runtime inter-core interferences for consecutive or inconsecutive concurrent programs. Our approach can reasonably estimate runtime inter-core interferences in shared cache by introducing lifetime and instruction fetching timing relations analysis into address mapping method. Compared with the method based on lifetime alone, our proposed approach efficiently improves the tightness of WCET estimation.


Sign in / Sign up

Export Citation Format

Share Document