Efficient Instruction and Data Caching for High Performance Embedded Processors

Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

Nanophotonics ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 937-945
Author(s):  
Ruihuan Zhang ◽  
Yu He ◽  
Yong Zhang ◽  
Shaohua An ◽  
Qingming Zhu ◽  
...  

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.


2020 ◽  
Vol 2 (3) ◽  
pp. 158-168
Author(s):  
Muhammad Raza Naqvi

Mostly communication now days is done through SoC (system on chip) models so, NoC (network on chip) architecture is most appropriate solution for better performance. However, one of major flaws in this architecture is power consumption. To gain high performance through this type of architecture it is necessary to confirm power consumption while designing this. Use of power should be diminished in every region of network chip architecture. Lasting power consumption can be lessened by reaching alterations in network routers and other devices used to form that network. This research mainly focusses on state-of-the-art methods for designing NoC architecture and techniques to reduce power consumption in those architectures like, network architecture, network links between nodes, network design, and routers.


2016 ◽  
Vol 25 (09) ◽  
pp. 1650115
Author(s):  
Shuai Wang ◽  
Tao Jin ◽  
Chuanlei Zheng ◽  
Guangshan Duan

The degradation of CMOS devices over the lifetime can cause severe threat to the system performance and reliability at deep submicron semiconductor technologies. The negative bias temperature instability (NBTI) is among the most important sources of the aging mechanisms. Applying the traditional guardbanding technique to address the decreased speed of devices is too costly. On-chip memory structures, such as register files and on-chip caches, suffer a very high NBTI stress. In this paper, we propose the aging-aware design to combat the NBTI-induced aging in integer register files, data caches and instruction caches in high-performance microprocessors. The proposed aging-aware design can mitigate the negative aging effects by balancing the duty cycle ratio of the internal bits in on-chip memory structures. Besides the aging problem, the power consumption is also one of the most prominent issues in microprocessor design. Therefore, we further propose to apply the low power schemes to different memory structures under aging-aware design. The proposed low power aging-aware design can also achieve a significant power reduction, which will further reduce the temperature and NBTI degradation of the on-chip memory structures. Our experimental results show that our aging-aware design can effectively reduce the NBTI stress with 30.8%, 64.5% and 72.0% power saving for the integer register file, data cache and instruction cache, respectively.


Universal interconnection networks are prime performance tailback for high performance SoCs (Systems-on-Chip). Since shrinking the size of the ICs (Integrated Circuits) is the main aim, NoC (Network-on-Chip), being a segmental and mountable design tactic is a propitious substitute to outmoded bus-mode architectures. NoC combined with 3D-Routers and label switching technique can guarantee low power consumption, QoS along with less latency. In the proposed work, 3D NoCs are proven to be more advantageous by achieving 39.9% reduction in Area, 1.7% reduction in Power Consumption, and 11.3% reduction in Memory usage.


2015 ◽  
Vol 24 (09) ◽  
pp. 1550141 ◽  
Author(s):  
Erulappan Sakthivel ◽  
Veluchamy Malathi ◽  
Muruganantham Arunraja

In recent days, network-on-chip (NoC) researchers focus mainly on the area reduction and low power consumption both in architectural and algorithmic approach. To achieve low power and high performance in NoC architecture, sense amplifiers (SAs) introduced which can consume less power under various traffic conditions. In order to analyze the performance of architectural NoC design before fabrication level, the new simulator is developed based on multi core processor with improved sense amplifier (MCPSA) in this work. The MCPSA simulator provides user, the flexibility of incorporating various traffic configurations and routing algorithm with user reconfigurable option. In addition, the different SA model can be put into the simulation in plug and play manner for evaluation. The NoC case studies are presented to demonstrate the NoC architecture with double tail sense amplifier (DTSA) and modified-DTSA (M-DTSA) design. The performance metric such as delay, data rate and power consumption is evaluated. The main idea of this new simulator is to interface multisim environment (MSE) into a NoC environment for validating any DTSA.


2018 ◽  
Vol 15 (6) ◽  
pp. 792-803
Author(s):  
Sudhakar Jyothula

PurposeThe purpose of this paper is to design a low power clock gating technique using Galeor approach by assimilated with replica path pulse triggered flip flop (RP-PTFF).Design/methodology/approachIn the present scenario, the inclination of battery for portable devices has been increasing tremendously. Therefore, battery life has become an essential element for portable devices. To increase the battery life of portable devices such as communication devices, these have to be made with low power requirements. Hence, power consumption is one of the main issues in CMOS design. To reap a low-power battery with optimum delay constraints, a new methodology is proposed by using the advantages of a low leakage GALEOR approach. By integrating the proposed GALEOR technique with conventional PTFFs, a reduction in power consumption is achieved.FindingsThe design was implemented in mentor graphics EDA tools with 130 nm technology, and the proposed technique is compared with existing conventional PTFFs in terms of power consumption. The average power consumed by the proposed technique (RP-PTFF clock gating with the GALEOR technique) is reduced to 47 per cent compared to conventional PTFF for 100 per cent switching activity.Originality/valueThe study demonstrates that RP-PTFF with clock gating using the GALEOR approach is a design that is superior to the conventional PTFFs.


2018 ◽  
Vol 7 (2-1) ◽  
pp. 417
Author(s):  
Beulah Hemalatha S ◽  
Vigneswaran T

Application specific reconfiguration of On-chip communication link is a fast growing research area in system on chip (SoC) based system design. Optimization of the communication link is important to achieve a trade-off between efficient communication and low power consumption. So achieving both efficient communication and low power consumption requires a special optimization mechanism. Such Optimization problems can be solved using a genetic algorithm. Here, in this paper genetic algorithm based On-chip communication link reconfiguration is presented. The algorithm will optimize efficiency of communication link with constrain of low power consumption. The parameters involved in power consumption and efficient communication link are coded in the chromosomes. By evolutionary iteration the optimal parameters of the communication link are derived that is used for the communication link successfully in the simulated system. The performance of the simulated system is analyzed which shows the out performance of the proposed system.


Author(s):  
GOPALA KRISHNA.M ◽  
UMA SANKAR.CH ◽  
NEELIMA. S ◽  
KOTESWARA RAO.P

In this paper, presents circuit design of a low-power delay buffer. The proposed delay buffer uses several new techniques to reduce its power consumption. Since delay buffers are accessed sequentially, it adopts a ring-counter addressing scheme. In the ring counter, double-edge-triggered (DET) flip-flops are utilized to reduce the operating frequency by half and the C-element gated-clock strategy is proposed. Both total transistor count and the number of clocked transistors are significantly reduced to improve power consumption and speed in the flip-flop. The number of transistors is reduced by 56%-60% and the Area-Speed-Power product is reduced by 56%-63% compared to other double edge triggered flip-flops. This design is suitable for high-speed, low-power CMOS VLSI design applications.


Sign in / Sign up

Export Citation Format

Share Document