instructions per cycle Latest Research Papers

In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads to maintain sharing data efficiently. In this work, an enhanced router network is integrated within the private cache level for fast interconnecting sharing data accesses existing in different threads. All sharing data in private cache level can be classified into seven access types by experimental pattern analysis. Then, both shared accesses and thread-crossed accesses can be rapidly detected and dealt with in the proposed router network. As a result, the access latency of private cache is decreased, and a conventional coherence traffic problem is alleviated. The process in the proposed path is composed of three steps. Firstly, the target accesses can be detected by exploring in the router network. Then, the proposed replacement logic can handle those accesses for maintaining data coherence. Finally, those accesses are delivered in the proposed data deliverer. Thus, the harmful data sharing accesses are solved within the first chip layer in 3D-IC structure. The proposed system is also implemented into a cycle-precise simulation platform, and experimental results illustrate that our model can improve the Instructions Per Cycle (IPC) of on-chip execution by maximum 31.85 percent, while energy consumption can be saved by about 17.61 percent compared to the base system.

Download Full-text

Probabilistic Analysis of Steady-State Temperature and Maximum Frequency of Multicore Processors considering Workload Variation

Mathematical Problems in Engineering ◽

10.1155/2016/2462504 ◽

2016 ◽

Vol 2016 ◽

pp. 1-17

Author(s):

Biying Zhang ◽

Zhongchuan Fu ◽

Hongsong Chen ◽

Gang Cui

Keyword(s):

Linear Function ◽

Multicore Processors ◽

Probabilistic Method ◽

Linear Functions ◽

Maximum Frequency ◽

Dynamic Frequency Scaling ◽

Probabilistic Distribution ◽

Steady State Temperature ◽

Instructions Per Cycle ◽

Workload Variation

A probabilistic method is presented to analyze the temperature and the maximum frequency for multicore processors based on consideration of workload variation, in this paper. Firstly, at the microarchitecture level, dynamic powers are modeled as the linear function of IPCs (instructions per cycle), and leakage powers are approximated as the linear function of temperature. Secondly, the microarchitecture-level hotspot temperatures of both active cores and inactive cores are derived as the linear functions of IPCs. The normal probabilistic distribution of hotspot temperatures is derived based on the assumption that IPCs of all cores follow the same normal distribution. Thirdly and lastly, the probabilistic distribution of the set of discrete frequencies is determined. It can be seen from the experimental results that hotspot temperatures of multicore processors are not deterministic and have significant variations, and the number of active cores and running frequency simultaneously determine the probabilistic distribution of hotspot temperatures. The number of active cores not only results in different probabilistic distribution of frequencies, but also leads to different probabilities for triggering DFS (dynamic frequency scaling).

Download Full-text

Applying 2-way Superscalar Technique to a 32-bit RISC Microprocessor

Science and Technology Development Journal ◽

10.32508/stdj.v16i4.1582 ◽

2013 ◽

Vol 16 (4) ◽

pp. 33-42

Author(s):

Quynh Ngoc Do ◽

Hoang Nguyen Thanh Hau

Keyword(s):

Single Machine ◽

Processing Speed ◽

Program Code ◽

Machine Cycle ◽

Maximum Ideal ◽

Order Execution ◽

Instructions Per Cycle

In one-way microprocessor, the program code is executed at the maximum (ideal) rate of one instruction per cycle. In practice, due to the occurrence of branch instruction, this rate is less than 1. Superscalar architecture, when applied to a 32-bit RISC microprocessor, enables the handling of two instructions in a single machine cycle. To further increase the processing speed, the out-of-order execution is also applied to process an instruction that its operands are ready. As a result, the microprocessor which can complete two instructions per cycle is obtained.

Download Full-text

Global-aware and multi-order context-based prefetching for high-performance processors

The International Journal of High Performance Computing Applications ◽

10.1177/1094342010394386 ◽

2011 ◽

Vol 25 (4) ◽

pp. 355-370 ◽

Cited By ~ 1

Author(s):

Yong Chen ◽

Huaiyu Zhu ◽

Philip C. Roth ◽

Hui Jin ◽

Xian-He Sun

Keyword(s):

High Performance ◽

Data Access ◽

Context Analysis ◽

General Applicability ◽

Computing Systems ◽

Global Context ◽

Simulation Testing ◽

Current Context ◽

Primary Focus ◽

Instructions Per Cycle

Data prefetching is widely used in high-end computing systems to accelerate data accesses and to bridge the increasing performance gap between processor and memory. Context-based prefetching has become a primary focus of study in recent years due to its general applicability. However, current context-based prefetchers only adopt the context analysis of a single order, which suffers from low prefetching coverage and thus limits the overall prefetching effectiveness. Also, existing approaches usually consider the context of the address stream from a single instruction but not the context of the address stream from all instructions, which further limits the context-based prefetching effectiveness. In this study, we propose a new context-based prefetcher called the Global-aware and Multi-order Context-based (GMC) prefetcher. The GMC prefetcher uses multi-order, local and global context analysis to increase prefetching coverage while maintaining prefetching accuracy. In extensive simulation testing of the SPEC-CPU2006 benchmarks with an enhanced CMP$im simulator, the proposed GMC prefetcher was shown to outperform existing prefetchers and to reduce the data-access latency effectively. The average Instructions Per Cycle (IPC) improvement of SPEC CINT2006 and CFP2006 benchmarks with GMC prefetching was over 55% and 44% respectively.

Download Full-text

INTRA-APPLICATION ENERGY REDUCTION FOR MICROPROCESSOR LOW-POWER DESIGN

Journal of Circuits System and Computers ◽

10.1142/s0218126609005010 ◽

2009 ◽

Vol 18 (01) ◽

pp. 181-198 ◽

Cited By ~ 1

Author(s):

XIAO XIN XIA ◽

TENG TIOW TAY

Keyword(s):

Energy Consumption ◽

Low Power ◽

Sampling Method ◽

Energy Savings ◽

Low Power Design ◽

Energy Reduction ◽

Software Optimization ◽

Application Identification ◽

Instructions Per Cycle ◽

Important Design

Energy consumption is one of the most important design constraints for modern microprocessors, and designers have proposed many energy-saving techniques. Looking beyond the traditional hardware low-power designs, software optimization is becoming a significant strategy for the microprocessor to lower its energy consumption. This paper describes an intra-application identification and reconfiguration mechanism for microprocessor energy reduction. Our mechanism employs a statistical sampling method during training runs to identify code sections among application that have appropriate IPC (Instructions per Cycle) values and could make contributions to program runtime energy reduction, and then profiles them to dynamically scale the voltage and frequency of the microprocessor at appropriate points during execution. In our simulation, our approach achieves energy savings by an average of 39% with minor performance degradation, compared to a processor running at a fixed voltage and speed.

Download Full-text

On the performance potential of speculative execution based on branch and value prediction

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0301083m ◽

2003 ◽

Vol 16 (1) ◽

pp. 83-91

Author(s):

Pece Mitrevski ◽

Marjan Gusev

Keyword(s):

Petri Nets ◽

Dynamic Behavior ◽

Discrete Event Simulation ◽

Discrete Event ◽

Stochastic Petri Nets ◽

Speculative Execution ◽

Performance Potential ◽

Value Prediction ◽

Event Simulation ◽

Instructions Per Cycle

Fluid Stochastic Petri Nets are used to capture the dynamic behavior of an ILP processor, and discrete-event simulation is applied to assess the performance potential of predictions and speculative execution in boosting the performance of ILP processors that fetch, issue, execute and commit a large number of instructions per cycle.

Download Full-text

Handling 16 instructions per cycle in a superscalar processor

Future Generation Computer Systems ◽

10.1016/s0167-739x(00)00053-4 ◽

2001 ◽

Vol 17 (6) ◽

pp. 699-709

Author(s):

Bernard Goossens

Keyword(s):

Superscalar Processor ◽

Instructions Per Cycle

Download Full-text

instructions per cycle
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems

Probabilistic Analysis of Steady-State Temperature and Maximum Frequency of Multicore Processors considering Workload Variation

Applying 2-way Superscalar Technique to a 32-bit RISC Microprocessor

Global-aware and multi-order context-based prefetching for high-performance processors

INTRA-APPLICATION ENERGY REDUCTION FOR MICROPROCESSOR LOW-POWER DESIGN

On the performance potential of speculative execution based on branch and value prediction

Handling 16 instructions per cycle in a superscalar processor

Export Citation Format

instructions per cycleRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems

Probabilistic Analysis of Steady-State Temperature and Maximum Frequency of Multicore Processors considering Workload Variation

Applying 2-way Superscalar Technique to a 32-bit RISC Microprocessor

Global-aware and multi-order context-based prefetching for high-performance processors

INTRA-APPLICATION ENERGY REDUCTION FOR MICROPROCESSOR LOW-POWER DESIGN

On the performance potential of speculative execution based on branch and value prediction

Handling 16 instructions per cycle in a superscalar processor

instructions per cycle
Recently Published Documents