Scratchpad memory: a design alternative for cache on-chip memory in embedded systems

Author(s):  
R. Banakar ◽  
S. Steinke ◽  
Bo-Sik Lee ◽  
M. Balakrishnan ◽  
P. Marwedel

Author(s):  
Kavita Tabbassum ◽  
Shah Nawaz Talpur ◽  
Sanam Narejo ◽  
Noor-u-Zaman Leghari

Consuming the conventional approaches, processors are incapable to achieve effective energy reduction. In upcoming processors on-chip memory system will be the major restriction. On-chip memories are managed by the software SMCs (Software Managed Chips), and are work with caches (on-chip), where inside a block of caches software can explicitly read as well as write specific or complete memory references, or work separately just like scratchpad memory. In embedded systems Scratch memory is generally used as an addition to caches or as a substitute of cache, but due to their comprehensive ease of programmability cache containing architectures are still to be chosen in numerous applications. In contrast to conventional caches in embedded schemes because of their better energy and silicon range effectiveness SPM (Scratch-Pad Memories) are being progressively used. Power consumption of ported applications can significantly be lower as well as portability of scratchpad architectures will be advanced with the language agnostic software management method which is suggested in this manuscript. To enhance the memory configuration and optimization on relevant architectures based on SPM, the variety of current methods are reviewed for finding the chances of optimizations and usage of new methods as well as their applicability to numerous schemes of memory management are also discussed in this paper.



Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.



Author(s):  
Nilanjan Mukherjee ◽  
Artur Pogiel ◽  
Janusz Rajski ◽  
Jerzy Tyszer
Keyword(s):  


Author(s):  
Xiaohan Tao ◽  
Jianmin Pang ◽  
Jinlong Xu ◽  
Yu Zhu

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.



Micromachines ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1450
Author(s):  
Xiang Wang ◽  
Zhun Zhang ◽  
Qiang Hao ◽  
Dongdong Xu ◽  
Jiqing Wang ◽  
...  

The hardware security of embedded systems is raising more and more concerns in numerous safety-critical applications, such as in the automotive, aerospace, avionic, and railway systems. Embedded systems are gaining popularity in these safety-sensitive sectors with high performance, low power, and great reliability, which are ideal control platforms for executing instruction operation and data processing. However, modern embedded systems are still exposing many potential hardware vulnerabilities to malicious attacks, including software-level and hardware-level attacks; these can cause program execution failure and confidential data leakage. For this reason, this paper presents a novel embedded system by integrating a hardware-assisted security monitoring unit (SMU), for achieving a reinforced system-on-chip (SoC) on ensuring program execution and data processing security. This architecture design was implemented and evaluated on a Xilinx Virtex-5 FPGA development board. Based on the evaluation of the SMU hardware implementation in terms of performance overhead, security capability, and resource consumption, the experimental results indicate that the SMU does not lead to a significant speed degradation to processor while executing different benchmarks, and its average performance overhead reduces to 2.18% on typical 8-KB I/D-Caches. Security capability evaluation confirms the monitoring effectiveness of SMU against both instruction and data tampering attacks. Meanwhile, the SoC satisfies a good balance between high-security and resource overhead.



Author(s):  
Haoyuan Ying ◽  
Klaus Hofmann ◽  
Thomas Hollstein

Due to the growing demand on high performance and low power in embedded systems, many core architectures are proposed the most suitable solutions. While the design concentration of many core embedded systems is switching from computation-centric to communication-centric, Network-on-Chip (NoC) is one of the best interconnect techniques for such architectures because of the scalability and high communication bandwidth. Formalized and optimized system-level design methods for NoC-based many core embedded systems are desired to improve the system performance and to reduce the power consumption. In order to understand the design optimization methods in depth, a case study of optimizing many core embedded systems based on 3-Dimensional (3D) NoC with irregular vertical link distribution topology through task mapping, core placement, routing, and topology generation is demonstrated in this chapter. Results of cycle-accurate simulation experiments prove the validity and efficiency of the design methods. Specific to the case study configuration, in maximum 60% vertical links can be saved while maintaining the system efficiency in comparison to full vertical link connection 3D NoCs by applying the design optimization methods.



DYNA ◽  
2017 ◽  
Vol 84 (201) ◽  
pp. 202 ◽  
Author(s):  
Maribell Sacanamboy Franco ◽  
Freddy Bolaños-Martinez ◽  
Álvaro Bernal-Noreña ◽  
Rubén Nieto-Londoño

Los sistemas de red en chip (NoC) fueron desarrollados originalmente para proporcionar un alto rendimiento, mediante la disponibilidad de varias unidades de procesamiento, conectadas a través de una red cableada dentro del circuito integrado. Wireless NoC (WiNoC o WNoC) son una evolución natural de los sistemas NoC, que integran una comunicación jerárquica dentro del chip para mejorar la escalabilidad. El mapeo de tareas en los sistemas WNoC representa un proceso desafiante, que a menudo implica varios objetivos de optimización, como potencia, rendimiento, productividad, uso de recursos y métricas de red. Este artículo describe un algoritmo genético basado en un enfoque para encontrar soluciones óptimas de asignación de tareas en tiempo de diseño, para sistemas embebidos que trabajan sobre un WiNoC. Los objetivos de optimización fueron: Aceleración, Consumo de Energía y Ancho de Banda. La red de destino utilizada para la simulación puede ser vista como un WiNoC jerárquica de dos niveles. El primer nivel corresponde a un conjunto de subredes que están conectadas por cables y son de tipo malla. El segundo nivel corresponde a una topología en estrella de enlaces inalámbricos, que conectan las subredes de primer nivel. El algoritmo propuesto muestra un buen desempeño en relación con los objetivos de optimización y la WiNoC heterogéneo simulada.



Author(s):  
Da-Wei Chang ◽  
Ing-Chao Lin ◽  
Yu-Shiang Chien ◽  
Chin-Lun Lin ◽  
Alvin W.-Y Su ◽  
...  


IET Software ◽  
2013 ◽  
Vol 7 (1) ◽  
pp. 56-64 ◽  
Author(s):  
Padraig Fogarty ◽  
Ciaran MacNamee ◽  
Donal Heffernan


2014 ◽  
Vol 29 (4) ◽  
pp. 433-451
Author(s):  
Huafeng Yu ◽  
Abdoulaye Gamatié ◽  
Éric Rutten ◽  
Jean-Luc Dekeyser

AbstractSystem adaptivity is increasingly demanded in high-performance embedded systems, particularly in multimedia system-on-chip (SoC), owing to growing quality-of-service requirements. This paper presents a reactive control model that has been introduced in Gaspard, our framework dedicated to SoC hardware/software co-design. This model aims at expressing adaptivity as well as reconfigurability in systems performing data-intensive computations. It is generic enough to be used for description in the different parts of an embedded system, for example, specification of how different data-intensive algorithms can be chosen according to some computation modes at the functional level; and expression of how hardware components can be selected via the usage of a library of intellectual properties according to execution performances. The transformation of this model toward synchronous languages is also presented, in order to allow an automatic code generation usable for formal verification, based on techniques such as model checking and controller synthesis, as illustrated in the paper. This work, based on Model-Driven Engineering and the standard UML MARTE profile, has been implemented in Gaspard.



Sign in / Sign up

Export Citation Format

Share Document