Fast and Accurate Code Placement of Embedded Software for Hybrid On-Chip Memory Architecture

Data movement between the Convolutional Neural Network (CNN) accelerators and off-chip memory is critical concerning the overall power consumption. Minimizing power consumption is particularly important for low power embedded applications. Specific CNN computes patterns offer a possibility of significant data reuse, leading to the idea of using specialized on-chip cache memories which enable a significant improvement in power consumption. However, due to the unique caching pattern present within CNNs, standard cache memories would not be efficient. In this paper, a novel on-chip cache memory architecture, based on the idea of input feature map striping, is proposed, which requires significantly less on-chip memory resources compared to previously proposed solutions. Experiment results show that the proposed cache architecture can reduce on-chip memory size by a factor of 16 or more, while increasing power consumption no more than 15%, compared to some of the previously proposed solutions.

Download Full-text

Fast and Scalable Pattern Matching for Memory Architecture

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2016.1344 ◽

2016 ◽

pp. 84-89

Author(s):

J. Santhi ◽

L. Srinivas

Keyword(s):

Pattern Matching ◽

Large Fraction ◽

Bloom Filters ◽

Memory Architecture ◽

Embedded Memory ◽

Content Filtering ◽

Pattern Length ◽

Memory Accesses ◽

On Chip ◽

Specialized Hardware

Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been developed for such applications, we find that most of them suffer from scalability issues. We present a hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded memory and a few megabytes of external SRAM.

Download Full-text

MAX: A Multi Objective Memory Architecture eXploration Framework for Embedded Systems-on-Chip

20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07) ◽

10.1109/vlsid.2007.102 ◽

2007 ◽

Cited By ~ 1

Author(s):

T.S. Rajesh Kumar ◽

C.P. Ravikumar ◽

R. Govindarajan

Keyword(s):

Embedded Systems ◽

Memory Architecture ◽

Multi Objective ◽

Architecture Exploration ◽

Systems On Chip ◽

On Chip

Download Full-text

An on-chip parallel memory architecture for a stereo vision system

2010 17th IEEE International Conference on Electronics, Circuits and Systems ◽

10.1109/icecs.2010.5724557 ◽

2010 ◽

Cited By ~ 2

Author(s):

Andy Motten ◽

Luc Claesen

Keyword(s):

Stereo Vision ◽

Vision System ◽

Memory Architecture ◽

Parallel Memory ◽

Stereo Vision System ◽

On Chip

Download Full-text

Systems-on-chip needs for embedded software development

ACM SIGPLAN Notices ◽

10.1145/566225.513831 ◽

2002 ◽

Vol 37 (7) ◽

pp. 1-1

Author(s):

Philippe Magarshack

Keyword(s):

Software Development ◽

Embedded Software ◽

Systems On Chip ◽

On Chip

Download Full-text

Optimal code placement of embedded software for instruction caches

Proceedings ED&TC European Design and Test Conference ◽

10.1109/edtc.1996.494132 ◽

2002 ◽

Cited By ~ 23

Author(s):

H. Tomiyama ◽

H. Yasuura

Keyword(s):

Embedded Software ◽

Optimal Code ◽

Instruction Caches ◽

Code Placement

Download Full-text

How Europe Is Preparing Its Core Solution for Exascale Machines and a Global, Sovereign, Advanced Computing Platform

Mathematical and Computational Applications ◽

10.3390/mca25030046 ◽

2020 ◽

Vol 25 (3) ◽

pp. 46 ◽

Cited By ~ 1

Author(s):

Mario Kovač ◽

Philippe Notton ◽

Daniel Hofman ◽

Josip Knezović

Keyword(s):

High Performance ◽

Embedded Software ◽

Economic Viability ◽

Computing Platform ◽

Systems On Chip ◽

The World ◽

Core Solution ◽

On Chip ◽

Advanced Computing ◽

Performance Computing

In this paper, we present an overview of the European Processor Initiative (EPI), one of the cornerstones of the EuroHPC Joint Undertaking, a new European Union strategic entity focused on pooling the Union’s and national resources on HPC to acquire, build and deploy the most powerful supercomputers in the world within Europe. EPI started its activities in December 2018. The first three years drew processor and platform designers, embedded software, middleware, applications and usage experts from 10 EU countries together to co-design Europe’s first HPC Systems on Chip and accelerators with its unique Common Platform (CP) technology. One of EPI’s core activities also takes place in the automotive sector, providing architectural solutions for a novel embedded high-performance computing (eHPC) platform and ensuring the overall economic viability of the initiative.

Download Full-text