Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems

In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads to maintain sharing data efficiently. In this work, an enhanced router network is integrated within the private cache level for fast interconnecting sharing data accesses existing in different threads. All sharing data in private cache level can be classified into seven access types by experimental pattern analysis. Then, both shared accesses and thread-crossed accesses can be rapidly detected and dealt with in the proposed router network. As a result, the access latency of private cache is decreased, and a conventional coherence traffic problem is alleviated. The process in the proposed path is composed of three steps. Firstly, the target accesses can be detected by exploring in the router network. Then, the proposed replacement logic can handle those accesses for maintaining data coherence. Finally, those accesses are delivered in the proposed data deliverer. Thus, the harmful data sharing accesses are solved within the first chip layer in 3D-IC structure. The proposed system is also implemented into a cycle-precise simulation platform, and experimental results illustrate that our model can improve the Instructions Per Cycle (IPC) of on-chip execution by maximum 31.85 percent, while energy consumption can be saved by about 17.61 percent compared to the base system.

Download Full-text

Efficient Instruction and Data Caching for High Performance Embedded Processors

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jji-i3a.201201788 ◽

1970 ◽

pp. 9

Author(s):

A. Ferrerón Labari ◽

D. Suárez Gracia ◽

V. Viñals Yúfera

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Low Power ◽

Interconnection Networks ◽

High Performance ◽

Critical Issue ◽

Content Management ◽

Structure Design ◽

Portable Devices ◽

On Chip

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

Download Full-text

INTRA-APPLICATION ENERGY REDUCTION FOR MICROPROCESSOR LOW-POWER DESIGN

Journal of Circuits System and Computers ◽

10.1142/s0218126609005010 ◽

2009 ◽

Vol 18 (01) ◽

pp. 181-198 ◽

Cited By ~ 1

Author(s):

XIAO XIN XIA ◽

TENG TIOW TAY

Keyword(s):

Energy Consumption ◽

Low Power ◽

Sampling Method ◽

Energy Savings ◽

Low Power Design ◽

Energy Reduction ◽

Software Optimization ◽

Application Identification ◽

Instructions Per Cycle ◽

Important Design

Energy consumption is one of the most important design constraints for modern microprocessors, and designers have proposed many energy-saving techniques. Looking beyond the traditional hardware low-power designs, software optimization is becoming a significant strategy for the microprocessor to lower its energy consumption. This paper describes an intra-application identification and reconfiguration mechanism for microprocessor energy reduction. Our mechanism employs a statistical sampling method during training runs to identify code sections among application that have appropriate IPC (Instructions per Cycle) values and could make contributions to program runtime energy reduction, and then profiles them to dynamically scale the voltage and frequency of the microprocessor at appropriate points during execution. In our simulation, our approach achieves energy savings by an average of 39% with minor performance degradation, compared to a processor running at a fixed voltage and speed.

Download Full-text

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

The Journal of Supercomputing ◽

10.1007/s11227-021-03853-x ◽

2021 ◽

Author(s):

Xiaohan Tao ◽

Jianmin Pang ◽

Jinlong Xu ◽

Yu Zhu

Keyword(s):

Energy Consumption ◽

High Performance ◽

Scientific Computing ◽

Data Transfer ◽

Performance Model ◽

Experimental Result ◽

Transfer Model ◽

Scratchpad Memory ◽

On Chip ◽

Many Core

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.

Download Full-text

Mobile Networks-on-Chip Mapping Algorithms for Optimization of Latency and Energy Consumption

Mobile Networks and Applications ◽

10.1007/s11036-021-01827-0 ◽

2021 ◽

Author(s):

Arvind Kumar ◽

Vivek Kumar Sehgal ◽

Gaurav Dhiman ◽

S. Vimal ◽

Ashutosh Sharma ◽

...

Keyword(s):

Energy Consumption ◽

Mobile Networks ◽

Mapping Algorithms ◽

Networks On Chip ◽

On Chip

Download Full-text

Influence of tree species and machine settings on chip quality and specific energy consumption of a stationary drum chipper

Biomass and Bioenergy ◽

10.1016/j.biombioe.2021.106305 ◽

2021 ◽

Vol 155 ◽

pp. 106305

Author(s):

Daniel Kuptz ◽

Hans Hartmann

Keyword(s):

Energy Consumption ◽

Specific Energy ◽

Tree Species ◽

Specific Energy Consumption ◽

Chip Quality ◽

On Chip

Download Full-text

Research on Energy Aware Routing for Wireless Sensor Networks

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.191 ◽

2013 ◽

Vol 303-306 ◽

pp. 191-196

Author(s):

Wei Zhang ◽

Ling Hua Zhang

Keyword(s):

Energy Consumption ◽

Optimal Path ◽

Critical Issue ◽

Residual Energy ◽

Sensor Nodes ◽

Energy Aware ◽

Energy Aware Routing ◽

Transmission Energy Consumption ◽

Transmission Energy ◽

Improved Algorithm

Energy aware routing is a critical issue in WSN. Prior work in energy aware routing concerned about transmission energy consumption and residual energy, but often do not consider path hop length, which leads to unnecessary consumption of power at sensor nodes. Improved algorithm adds the control of routing hops. Simulation proof the improved algorithm is feasible, effectively reducing the network delay and the path of energy consumption. Taking into account the WSN is dynamic, in the end we put up dynamic hops control in order to adapt to WSN and select the optimal path.

Download Full-text

Research on Network-on-Chip Dynamic and Adaptive Algorithm and Choice Strategy

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.539.296 ◽

2014 ◽

Vol 539 ◽

pp. 296-302

Author(s):

Dong Li

Keyword(s):

Energy Consumption ◽

Execution Time ◽

Adaptive Algorithm ◽

Network On Chip ◽

Time Dimension ◽

Choice Strategy ◽

Bus Structure ◽

On Chip ◽

Mapping Scheme ◽

Single Objective

With further increase of the number of on-chip device, the bus structure has not met the requirements. In order to make better communication between each part, the chip designers need to explore a new structure to solve the interconnection of on-chip device. The paper proposes a network-on-chip dynamic and adaptive algorithm which selects NoC platform with 2-dimension mesh as the carrier, incorporates communication energy consumption and delay into unified cost function and uses ant colony optimization to realize NOC map facing energy consumption and delay. The experiment indicates that compared with random map, single objective optimization can separately saves (30%~47 %) and ( 20%~39%) in communication energy consumption and execution time compared with random map, and joint objective optimization can further excavate the potential of time dimension in mapping scheme dominated by the energy.

Download Full-text

Ultrafast control of vortex microlasers

Science ◽

10.1126/science.aba4597 ◽

2020 ◽

Vol 367 (6481) ◽

pp. 1018-1021 ◽

Cited By ~ 43

Author(s):

Can Huang ◽

Chen Zhang ◽

Shumin Xiao ◽

Yuhan Wang ◽

Yubin Fan ◽

...

Keyword(s):

Energy Consumption ◽

High Speed ◽

Optical Switching ◽

Quantum Information Processing ◽

All Optical ◽

Low Energy Consumption ◽

Linearly Polarized ◽

Terahertz Frequencies ◽

On Chip ◽

All Optical Switching

The development of classical and quantum information–processing technology calls for on-chip integrated sources of structured light. Although integrated vortex microlasers have been previously demonstrated, they remain static and possess relatively high lasing thresholds, making them unsuitable for high-speed optical communication and computing. We introduce perovskite-based vortex microlasers and demonstrate their application to ultrafast all-optical switching at room temperature. By exploiting both mode symmetry and far-field properties, we reveal that the vortex beam lasing can be switched to linearly polarized beam lasing, or vice versa, with switching times of 1 to 1.5 picoseconds and energy consumption that is orders of magnitude lower than in previously demonstrated all-optical switching. Our results provide an approach that breaks the long-standing trade-off between low energy consumption and high-speed nanophotonics, introducing vortex microlasers that are switchable at terahertz frequencies.

Download Full-text