scholarly journals Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems

Electronics ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 1363 ◽  
Author(s):  
Zhao ◽  
Jia ◽  
Watanabe

In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads to maintain sharing data efficiently. In this work, an enhanced router network is integrated within the private cache level for fast interconnecting sharing data accesses existing in different threads. All sharing data in private cache level can be classified into seven access types by experimental pattern analysis. Then, both shared accesses and thread-crossed accesses can be rapidly detected and dealt with in the proposed router network. As a result, the access latency of private cache is decreased, and a conventional coherence traffic problem is alleviated. The process in the proposed path is composed of three steps. Firstly, the target accesses can be detected by exploring in the router network. Then, the proposed replacement logic can handle those accesses for maintaining data coherence. Finally, those accesses are delivered in the proposed data deliverer. Thus, the harmful data sharing accesses are solved within the first chip layer in 3D-IC structure. The proposed system is also implemented into a cycle-precise simulation platform, and experimental results illustrate that our model can improve the Instructions Per Cycle (IPC) of on-chip execution by maximum 31.85 percent, while energy consumption can be saved by about 17.61 percent compared to the base system.

Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.


2009 ◽  
Vol 18 (01) ◽  
pp. 181-198 ◽  
Author(s):  
XIAO XIN XIA ◽  
TENG TIOW TAY

Energy consumption is one of the most important design constraints for modern microprocessors, and designers have proposed many energy-saving techniques. Looking beyond the traditional hardware low-power designs, software optimization is becoming a significant strategy for the microprocessor to lower its energy consumption. This paper describes an intra-application identification and reconfiguration mechanism for microprocessor energy reduction. Our mechanism employs a statistical sampling method during training runs to identify code sections among application that have appropriate IPC (Instructions per Cycle) values and could make contributions to program runtime energy reduction, and then profiles them to dynamically scale the voltage and frequency of the microprocessor at appropriate points during execution. In our simulation, our approach achieves energy savings by an average of 39% with minor performance degradation, compared to a processor running at a fixed voltage and speed.


Author(s):  
Xiaohan Tao ◽  
Jianmin Pang ◽  
Jinlong Xu ◽  
Yu Zhu

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.


Author(s):  
Arvind Kumar ◽  
Vivek Kumar Sehgal ◽  
Gaurav Dhiman ◽  
S. Vimal ◽  
Ashutosh Sharma ◽  
...  

2013 ◽  
Vol 303-306 ◽  
pp. 191-196
Author(s):  
Wei Zhang ◽  
Ling Hua Zhang

Energy aware routing is a critical issue in WSN. Prior work in energy aware routing concerned about transmission energy consumption and residual energy, but often do not consider path hop length, which leads to unnecessary consumption of power at sensor nodes. Improved algorithm adds the control of routing hops. Simulation proof the improved algorithm is feasible, effectively reducing the network delay and the path of energy consumption. Taking into account the WSN is dynamic, in the end we put up dynamic hops control in order to adapt to WSN and select the optimal path.


2014 ◽  
Vol 539 ◽  
pp. 296-302
Author(s):  
Dong Li

With further increase of the number of on-chip device, the bus structure has not met the requirements. In order to make better communication between each part, the chip designers need to explore a new structure to solve the interconnection of on-chip device. The paper proposes a network-on-chip dynamic and adaptive algorithm which selects NoC platform with 2-dimension mesh as the carrier, incorporates communication energy consumption and delay into unified cost function and uses ant colony optimization to realize NOC map facing energy consumption and delay. The experiment indicates that compared with random map, single objective optimization can separately saves (30%~47 %) and ( 20%~39%) in communication energy consumption and execution time compared with random map, and joint objective optimization can further excavate the potential of time dimension in mapping scheme dominated by the energy.


Science ◽  
2020 ◽  
Vol 367 (6481) ◽  
pp. 1018-1021 ◽  
Author(s):  
Can Huang ◽  
Chen Zhang ◽  
Shumin Xiao ◽  
Yuhan Wang ◽  
Yubin Fan ◽  
...  

The development of classical and quantum information–processing technology calls for on-chip integrated sources of structured light. Although integrated vortex microlasers have been previously demonstrated, they remain static and possess relatively high lasing thresholds, making them unsuitable for high-speed optical communication and computing. We introduce perovskite-based vortex microlasers and demonstrate their application to ultrafast all-optical switching at room temperature. By exploiting both mode symmetry and far-field properties, we reveal that the vortex beam lasing can be switched to linearly polarized beam lasing, or vice versa, with switching times of 1 to 1.5 picoseconds and energy consumption that is orders of magnitude lower than in previously demonstrated all-optical switching. Our results provide an approach that breaks the long-standing trade-off between low energy consumption and high-speed nanophotonics, introducing vortex microlasers that are switchable at terahertz frequencies.


Sign in / Sign up

Export Citation Format

Share Document