instructions per cycle
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 1)

H-INDEX

1
(FIVE YEARS 1)

Electronics ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 1363 ◽  
Author(s):  
Zhao ◽  
Jia ◽  
Watanabe

In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads to maintain sharing data efficiently. In this work, an enhanced router network is integrated within the private cache level for fast interconnecting sharing data accesses existing in different threads. All sharing data in private cache level can be classified into seven access types by experimental pattern analysis. Then, both shared accesses and thread-crossed accesses can be rapidly detected and dealt with in the proposed router network. As a result, the access latency of private cache is decreased, and a conventional coherence traffic problem is alleviated. The process in the proposed path is composed of three steps. Firstly, the target accesses can be detected by exploring in the router network. Then, the proposed replacement logic can handle those accesses for maintaining data coherence. Finally, those accesses are delivered in the proposed data deliverer. Thus, the harmful data sharing accesses are solved within the first chip layer in 3D-IC structure. The proposed system is also implemented into a cycle-precise simulation platform, and experimental results illustrate that our model can improve the Instructions Per Cycle (IPC) of on-chip execution by maximum 31.85 percent, while energy consumption can be saved by about 17.61 percent compared to the base system.


2016 ◽  
Vol 2016 ◽  
pp. 1-17
Author(s):  
Biying Zhang ◽  
Zhongchuan Fu ◽  
Hongsong Chen ◽  
Gang Cui

A probabilistic method is presented to analyze the temperature and the maximum frequency for multicore processors based on consideration of workload variation, in this paper. Firstly, at the microarchitecture level, dynamic powers are modeled as the linear function of IPCs (instructions per cycle), and leakage powers are approximated as the linear function of temperature. Secondly, the microarchitecture-level hotspot temperatures of both active cores and inactive cores are derived as the linear functions of IPCs. The normal probabilistic distribution of hotspot temperatures is derived based on the assumption that IPCs of all cores follow the same normal distribution. Thirdly and lastly, the probabilistic distribution of the set of discrete frequencies is determined. It can be seen from the experimental results that hotspot temperatures of multicore processors are not deterministic and have significant variations, and the number of active cores and running frequency simultaneously determine the probabilistic distribution of hotspot temperatures. The number of active cores not only results in different probabilistic distribution of frequencies, but also leads to different probabilities for triggering DFS (dynamic frequency scaling).


2013 ◽  
Vol 16 (4) ◽  
pp. 33-42
Author(s):  
Quynh Ngoc Do ◽  
Hoang Nguyen Thanh Hau

In one-way microprocessor, the program code is executed at the maximum (ideal) rate of one instruction per cycle. In practice, due to the occurrence of branch instruction, this rate is less than 1. Superscalar architecture, when applied to a 32-bit RISC microprocessor, enables the handling of two instructions in a single machine cycle. To further increase the processing speed, the out-of-order execution is also applied to process an instruction that its operands are ready. As a result, the microprocessor which can complete two instructions per cycle is obtained.


Author(s):  
Yong Chen ◽  
Huaiyu Zhu ◽  
Philip C. Roth ◽  
Hui Jin ◽  
Xian-He Sun

Data prefetching is widely used in high-end computing systems to accelerate data accesses and to bridge the increasing performance gap between processor and memory. Context-based prefetching has become a primary focus of study in recent years due to its general applicability. However, current context-based prefetchers only adopt the context analysis of a single order, which suffers from low prefetching coverage and thus limits the overall prefetching effectiveness. Also, existing approaches usually consider the context of the address stream from a single instruction but not the context of the address stream from all instructions, which further limits the context-based prefetching effectiveness. In this study, we propose a new context-based prefetcher called the Global-aware and Multi-order Context-based (GMC) prefetcher. The GMC prefetcher uses multi-order, local and global context analysis to increase prefetching coverage while maintaining prefetching accuracy. In extensive simulation testing of the SPEC-CPU2006 benchmarks with an enhanced CMP$im simulator, the proposed GMC prefetcher was shown to outperform existing prefetchers and to reduce the data-access latency effectively. The average Instructions Per Cycle (IPC) improvement of SPEC CINT2006 and CFP2006 benchmarks with GMC prefetching was over 55% and 44% respectively.


2009 ◽  
Vol 18 (01) ◽  
pp. 181-198 ◽  
Author(s):  
XIAO XIN XIA ◽  
TENG TIOW TAY

Energy consumption is one of the most important design constraints for modern microprocessors, and designers have proposed many energy-saving techniques. Looking beyond the traditional hardware low-power designs, software optimization is becoming a significant strategy for the microprocessor to lower its energy consumption. This paper describes an intra-application identification and reconfiguration mechanism for microprocessor energy reduction. Our mechanism employs a statistical sampling method during training runs to identify code sections among application that have appropriate IPC (Instructions per Cycle) values and could make contributions to program runtime energy reduction, and then profiles them to dynamically scale the voltage and frequency of the microprocessor at appropriate points during execution. In our simulation, our approach achieves energy savings by an average of 39% with minor performance degradation, compared to a processor running at a fixed voltage and speed.


2003 ◽  
Vol 16 (1) ◽  
pp. 83-91
Author(s):  
Pece Mitrevski ◽  
Marjan Gusev

Fluid Stochastic Petri Nets are used to capture the dynamic behavior of an ILP processor, and discrete-event simulation is applied to assess the performance potential of predictions and speculative execution in boosting the performance of ILP processors that fetch, issue, execute and commit a large number of instructions per cycle.


Sign in / Sign up

Export Citation Format

Share Document