Design and Implement of Sharable Multi-Channel On-Chip Memory for Embedded CMP System

A kind of shared multi-channel on-chip memory architecture (SMC-OCM) for embedded CMPs is proposed in this article. To implement SMC-OCM architecture, the sharable multi-channel on-chip memory (MC-OCM) is designed and implemented based on FPGA. The characteristic of multiple data channel of MC-OCM assures good parallel responsiveness of SMC-OCM system. Experiments showed that the access latency of SMC-OCM is lower than that of the-state-of arts. SMC-OCM architecture satisfies the performance requirements for memory system by embedded applications

Download Full-text

Striping input feature map cache for reducing off-chip memory traffic in CNN accelerators

Telfor Journal ◽

10.5937/telfor2002116s ◽

2020 ◽

Vol 12 (2) ◽

pp. 116-121

Author(s):

Rastislav Struharik ◽

Vuk Vranjković

Keyword(s):

Power Consumption ◽

Data Reuse ◽

Memory Architecture ◽

Cache Memories ◽

Data Movement ◽

Feature Map ◽

Input Feature ◽

On Chip ◽

Memory Resources ◽

Embedded Applications

Data movement between the Convolutional Neural Network (CNN) accelerators and off-chip memory is critical concerning the overall power consumption. Minimizing power consumption is particularly important for low power embedded applications. Specific CNN computes patterns offer a possibility of significant data reuse, leading to the idea of using specialized on-chip cache memories which enable a significant improvement in power consumption. However, due to the unique caching pattern present within CNNs, standard cache memories would not be efficient. In this paper, a novel on-chip cache memory architecture, based on the idea of input feature map striping, is proposed, which requires significantly less on-chip memory resources compared to previously proposed solutions. Experiment results show that the proposed cache architecture can reduce on-chip memory size by a factor of 16 or more, while increasing power consumption no more than 15%, compared to some of the previously proposed solutions.

Download Full-text

A Design Space Exploration Method for On-Chip Memory System Based on Task Scheduling

2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess.2018.8663909 ◽

2018 ◽

Author(s):

Hongyu Meng ◽

Hongli Meng ◽

Pengfei Ding ◽

Mingxuan Wang ◽

Donglin Wang

Keyword(s):

Task Scheduling ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Memory System ◽

On Chip

Download Full-text

An Access-Pattern-Aware On-Chip Vector Memory System with Automatic Loading for SIMD Architectures

2018 IEEE High Performance extreme Computing Conference (HPEC) ◽

10.1109/hpec.2018.8547551 ◽

2018 ◽

Author(s):

Tong Geng ◽

Erkan Diken ◽

Tianqi Wang ◽

Lech Jozwiak ◽

Martin Herbordt

Keyword(s):

Memory System ◽

Access Pattern ◽

On Chip ◽

Automatic Loading

Download Full-text

Three-dimensional image processing VLSI system with network-on-chip system and reconfigurable memory architecture

IEEE Transactions on Consumer Electronics ◽

10.1109/tce.2011.6018893 ◽

2011 ◽

Vol 57 (3) ◽

pp. 1345-1353 ◽

Cited By ~ 2

Author(s):

Yun Yang

Keyword(s):

Image Processing ◽

Three Dimensional ◽

Network On Chip ◽

Memory Architecture ◽

Dimensional Image ◽

Vlsi System ◽

Reconfigurable Memory ◽

On Chip

Download Full-text

Optimal Space Management Mechanism of Sharable Multi-Channel On-Chip Memory

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.629.542 ◽

2012 ◽

Vol 629 ◽

pp. 542-547

Author(s):

Cai Xia Liu ◽

Xiao Qing Tian ◽

Zhi Bin Zhang

Keyword(s):

System Performance ◽

Allocation Strategy ◽

Management Mechanism ◽

Space Management ◽

Space Allocation ◽

On Chip ◽

Utilization Scheme ◽

Embedded Applications ◽

Optimal Space ◽

Model Based Analysis

A kind of shared multi-channel on-chip memory CMP architecture is proposed in this article to efficiently support embedded applications. For the multi-channel on-chip memory being scarce resource, optimal space management mechanism of multi-channel on-chip memory is proposed including automatic space allocation strategy based on application parallelization mapping pattern and optimal space utilization scheme. ILP-model-based analysis of system performance verifies that the proposed optimal space management mechanism can deeply exploit the efficiency of multi-channel on-chip memory to improve system performance.

Download Full-text

On-chip optical sense strategy illustrated in the case of flash memory system

2016 IEEE International Electron Devices Meeting (IEDM) ◽

10.1109/iedm.2016.7838475 ◽

2016 ◽

Cited By ~ 1

Author(s):

Junfeng Song ◽

Xianshu Luo ◽

Yanzhe Tang ◽

Chao Li ◽

Lianxi Jia ◽

...

Keyword(s):

Flash Memory ◽

Memory System ◽

On Chip

Download Full-text

Novel on-chip communication data channel architecture used in USB 2.0

2006 8th International Conference on Solid-State and Integrated Circuit Technology Proceedings ◽

10.1109/icsict.2006.306272 ◽

2006 ◽

Author(s):

Wei Zhou ◽

Hong Huang ◽

Cheng-shou Sun ◽

Xiao-fang Zhou

Keyword(s):

Data Channel ◽

On Chip ◽

Channel Architecture

Download Full-text

Fast and Scalable Pattern Matching for Memory Architecture

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2016.1344 ◽

2016 ◽

pp. 84-89

Author(s):

J. Santhi ◽

L. Srinivas

Keyword(s):

Pattern Matching ◽

Large Fraction ◽

Bloom Filters ◽

Memory Architecture ◽

Embedded Memory ◽

Content Filtering ◽

Pattern Length ◽

Memory Accesses ◽

On Chip ◽

Specialized Hardware

Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been developed for such applications, we find that most of them suffer from scalability issues. We present a hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded memory and a few megabytes of external SRAM.

Download Full-text

Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System

Euro-Par 2012 Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32820-6_24 ◽

2012 ◽

pp. 232-245

Author(s):

Bin Li ◽

Zhen Fang ◽

Li Zhao ◽

Xiaowei Jiang ◽

Lin Li ◽

...

Keyword(s):

Virtual Memory ◽

Memory System ◽

On Chip

Download Full-text

Emerging Real-Time Methodologies

Embedded Computing Systems ◽

10.4018/978-1-4666-3922-5.ch008 ◽

2013 ◽

pp. 140-159

Author(s):

Giorgio C. Buttazzo

Keyword(s):

Real Time ◽

Research Community ◽

Embedded Computing ◽

Computing Systems ◽

Performance Requirements ◽

Controlled Systems ◽

High Predictability ◽

New Methodologies ◽

On Chip ◽

Complex Scenario

The number of computer-controlled systems has increased dramatically in our daily life. Processors and microcontrollers are embedded in most of the devices we use every day, such as mobile phones, cameras, media players, navigators, washing machines, biomedical devices, and cars. The complexity of such systems is increasing exponentially, pushed by the demand of new products with extra functionality, higher performance requirements, and low energy consumption. To cope with such a complex scenario, many embedded systems are adopting more powerful and highly integrated hardware components, such as multi-core systems, network-on-chip architectures, inertial subsystems, and special purpose co-processors. However, developing, analyzing, and testing the application software on these architectures is not easy, and new methodologies are being investigated in the research community to guarantee high predictability and efficiency in next generation embedded devices. This chapter presents some recent approaches proposed within the real-time research community aimed at achieving predictability, high modularity, efficiency, and adaptability in modern embedded computing systems.

Download Full-text