WATT MATTERS MOST? DESIGN SPACE EXPLORATION OF HIGH-PERFORMANCE MICROPROCESSORS FOR POWER-PERFORMANCE EFFICIENCY

PEDRO TRANCOSO

doi:10.1142/s0218126607003721

WATT MATTERS MOST? DESIGN SPACE EXPLORATION OF HIGH-PERFORMANCE MICROPROCESSORS FOR POWER-PERFORMANCE EFFICIENCY

Journal of Circuits System and Computers ◽

10.1142/s0218126607003721 ◽

2007 ◽

Vol 16 (03) ◽

pp. 357-378

Author(s):

PEDRO TRANCOSO

Keyword(s):

High Performance ◽

Design Space Exploration ◽

High Sensitivity ◽

Clock Frequency ◽

Power Performance ◽

Large Power ◽

Multiple Parameters ◽

And Performance ◽

The One ◽

Power Awareness

Computer systems have evolved significantly in the last years leading to high-performance systems. This, however, has come with a cost of large power dissipation. As such, power-awareness has become a major factor in processor design. Therefore, it is important to have a complete understanding of the power and performance behavior of all processor components. In order to achieve this, the current work presents a comprehensive analysis of power-performance efficiency for different high-end microarchitecture configurations using three different workloads: multimedia, scientific, and database. The objectives of this work are: (1) to analyze and compare the power-performance efficiency for different workloads; (2) to present a sensitivity analysis for the microarchitecture parameters in order to identify which ones are more sensitive to changes in terms of power-performance efficiency; and (3) to propose power-performance efficient configurations for each workload. The simulation results show that the multimedia workload is the one achieving the highest efficiency but the database workload is the most sensitive to parameter changes. In addition, the results also show that the parameter sensitivity depends significantly on the workload. While the issue width and clock frequency present very high sensitivity across all workloads (approximately 100%), for the database workload, the first-level instruction cache size shows an even higher sensitivity (149%). The correct configuration of these microarchitecture parameters is essential. A careless configuration of a single parameter from a baseline setup may result in a loss of the power-performance efficiency of up to 99%. Finally, carefully tuning multiple parameters simultaneously may result in gains up to 154% over the power-performance efficiency of the baseline configuration.

Download Full-text

Core-Level Modeling and Frequency Prediction for DSP Applications on FPGAs

International Journal of Reconfigurable Computing ◽

10.1155/2015/784672 ◽

2015 ◽

Vol 2015 ◽

pp. 1-20

Author(s):

Gongyu Wang ◽

Greg Stitt ◽

Herman Lam ◽

Alan George

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Core Level ◽

Prediction Methods ◽

Clock Frequency ◽

Worst Case ◽

Model Based ◽

Dsp Applications

Field-programmable gate arrays (FPGAs) provide a promising technology that can improve performance of many high-performance computing and embedded applications. However, unlike software design tools, the relatively immature state of FPGA tools significantly limits productivity and consequently prevents widespread adoption of the technology. For example, the lengthy design-translate-execute (DTE) process often must be iterated to meet the application requirements. Previous works have enabled model-based, design-space exploration to reduce DTE iterations but are limited by a lack of accurate model-based prediction of key design parameters, the most important of which is clock frequency. In this paper, we present a core-level modeling and design (CMD) methodology that enables modeling of FPGA applications at an abstract level and yet produces accurate predictions of parameters such as clock frequency, resource utilization (i.e., area), and latency. We evaluate CMD’s prediction methods using several high-performance DSP applications on various families of FPGAs and show an average clock-frequency prediction error of 3.6%, with a worst-case error of 20.4%, compared to the best of existing high-level prediction methods, 13.9% average error with 48.2% worst-case error. We also demonstrate how such prediction enables accurate design-space exploration without coding in a hardware-description language (HDL), significantly reducing the total design time.

Download Full-text

VLSI ARCHITECTURE OF PARALLEL MULTIPLIER– ACCUMULATOR BASED ON RADIX-2 MODIFIED BOOTH ALGORITHM

International Journal of Electronics and Electical Engineering ◽

10.47893/ijeee.2012.1009 ◽

2012 ◽

pp. 40-46

Author(s):

Mr.M.V. Sathish ◽

Mrs. Sailaja

Keyword(s):

Signal Processing ◽

High Speed ◽

High Performance ◽

Vlsi Architecture ◽

Clock Frequency ◽

Parallel Multiplier ◽

Hybrid Type ◽

Standard Design ◽

Overall Performance ◽

And Performance

A new architecture of multiplier-andaccumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Download Full-text

The Role of Velocity Based Training in the Strength Periodization for Modern Athletes

Journal of Functional Morphology and Kinesiology ◽

10.3390/jfmk3040055 ◽

2018 ◽

Vol 3 (4) ◽

pp. 55 ◽

Cited By ~ 3

Author(s):

Aristide Guerriero ◽

Carlo Varalda ◽

Maria Piacentini

Keyword(s):

Resistance Training ◽

High Performance ◽

Daily Routine ◽

Power Performance ◽

Strength And Conditioning ◽

Measurement Units ◽

And Performance ◽

Correct Understanding ◽

Level Of Effort

Resistance training (RT) is considered the most important method to improve the athlete’s strength and rate of force development (RFD). In the last decade, the importance of monitoring velocity during RT has drastically grown, because of an increased availability of linear position transducers (LPT) and inertial measurement units (IMU). The purpose of this review is to analyze the existing literature on testing techniques and performance strategies used to enhance strength and power performance of elite athletes, by monitoring the velocity of resistance training. The authors focus in particular on the level of effort of resistance training defined by velocity; how the loss of velocity correlates with the degree of fatigue and how it can be used to enhance the performance of competitive athletes; the use of LPT as part of the daily routine of the strength and conditioning programs in competitive sport. It is therefore critical for the sports scientists to have a correct understanding of the basic concepts of the velocity-based training and their application to elite sports. The ultimate goal is to give some indications on the velocity-based resistance training integration in the programs of different sports in the high performance environment.

Download Full-text

Framework for Design Exploration and Performance Analysis of RF-NoC Manycore Architecture

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10040037 ◽

2020 ◽

Vol 10 (4) ◽

pp. 37

Author(s):

Habiba Lahdhiri ◽

Jordane Lorandel ◽

Salvatore Monteleone ◽

Emmanuelle Bourdel ◽

Maurizio Palesi

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Routing Algorithm ◽

Long Distance ◽

Promising Solution ◽

And Performance ◽

On Chip ◽

Many Core ◽

High Degree ◽

Real Traffic

The Network-on-chip (NoC) paradigm has been proposed as a promising solution to enable the handling of a high degree of integration in multi-/many-core architectures. Despite their advantages, wired NoC infrastructures are facing several performance issues regarding multi-hop long-distance communications. RF-NoC is an attractive solution offering high performance and multicast/broadcast capabilities. However, managing RF links is a critical aspect that relies on both application-dependent and architectural parameters. This paper proposes a design space exploration framework for OFDMA-based RF-NoC architecture, which takes advantage of both real application benchmarks simulated using Sniper and RF-NoC architecture modeled using Noxim. We adopted the proposed framework to finely configure a routing algorithm, working with real traffic, achieving up to 45% of delay reduction, compared to a wired NoC setup in similar conditions.

Download Full-text

An Economic Design of Rectifying Double Acceptance Sampling Plans via Maxima Nomination Sampling

Stochastics and Quality Control ◽

10.1515/eqc-2017-0018 ◽

2017 ◽

Vol 32 (2) ◽

Cited By ~ 1

Author(s):

Mansooreh Razmkhah ◽

Bahram Sadeghpour Gildeh ◽

Jafar Ahmadi

Keyword(s):

Loss Function ◽

High Performance ◽

High Sensitivity ◽

Simple Random Sampling ◽

Quality Characteristics ◽

Total Loss ◽

Acceptance Sampling ◽

Sampling Plans ◽

Acceptance Sampling Plans ◽

The One

AbstractIn industry when a lot of items is sent for inspection, double acceptance sampling plans (DASP) are considered as a way to decide on acceptance or rejection of the lot. If the lot contains items with high sensitivity, then the measuring of quality characteristics is destructive or costly. So we are looking for a method to decide that it has high performance. Using the ranked set sampling (RSS) method will make it stricter and more accurate whether or not to accept a lot. Moreover, it is affordable and will not burden extra costs on the buyer or the producer. In this paper, by using a special type of RSS, with the name of maxima nomination sampling (MNS), we design a DASP with regards to the total loss function. The results indicate that the total loss function, which is acquired by the MNS method, has lower values than the one using the simple random sampling (SRS) method.

Download Full-text

A New VLSI Architecture of Parallel Multiplier–Accumulator Based on Radix-2 Modified Booth Algorithm

International Journal of Instrumentation Control and Automation ◽

10.47893/ijica.2011.1036 ◽

2011 ◽

pp. 196-202

Author(s):

P.Sasi Bala ◽

S. Raghavendra

Keyword(s):

High Speed ◽

High Performance ◽

Vlsi Architecture ◽

Alpha Power ◽

Clock Frequency ◽

Parallel Multiplier ◽

Standard Design ◽

Overall Performance ◽

And Performance ◽

Least Significant Bits

In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic.By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 m, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai’s alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Download Full-text

Design Space Exploration of High-Performance Parallel Architectures

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v3i1.279 ◽

2008 ◽

Vol 3 (1) ◽

pp. 32-38

Author(s):

Enric Musoll ◽

Mario Nemirovsky

Keyword(s):

Power Efficiency ◽

High Performance ◽

Design Space Exploration ◽

Parallel Architecture ◽

Parallel Architectures ◽

Power Performance ◽

Power Budget ◽

Performance Goal ◽

Power Efficient ◽

On Chip

High-performance single-threaded processors achieve their performance goal partly by relying, among other architectural techniques, on speculation and large on-chip caches. The hardware to support these techniques is usually a large portion of the overall processor real state area, and therefore it consumes a significant amount of power that sometimes is not optimally used toward doing useful work. In this work, we study the intuitive fact that architectures with hardware support for threads are more power efficient than a more traditional single-threaded superscalar architecture. Toward this goal, we have created a model of the power, performance and area of several parallel architectures. This model shows that a parallel architecture can be designed so that (a) it requires less area and power (to reach the same performance), or (b) it achieves better power efficiency and less area (for the same power budget), or (c) it has higher performance and better power efficiency (for the same area constraint), when compared to a single-threaded superscalar architecture.

Download Full-text

FUNDING AND PERFORMANCE OF HIGH-LEVEL INDIVIDUAL SPORT IN BENIN

International Journal of Advanced Research ◽

10.21474/ijar01/12037 ◽

2020 ◽

Vol 8 (11) ◽

pp. 532-537

Author(s):

Akouete Coffi David ◽

◽

AhounouaÃ¯kpe Fifamin Judith ◽

Hounsou Semako Julien ◽

Dansou H. Pierre ◽

...

Keyword(s):

High Performance ◽

Public Funding ◽

Research Work ◽

The Other ◽

Other Hand ◽

And Performance ◽

The One ◽

High Level ◽

Individual Sports

This research work entitled Financing of high performance individual sport in Benin aims to analyze on the one hand, the effects of insufficient funding of high performance athletes in Benin on the development of individual sport and support of their elites and, on the other hand, the type of funding that would be best suited to this situation in Benin. It focuses on three sources of funding for sport: public funding, self-funding and other sources of funding. The results of the study show that, on the one hand, the insufficiency of the budgets allocated to high-performance individual sports constitutes in part an obstacle to the development of this type of sport, and on the other hand, that public funding does not favor not the improvement of the performance of high performance individual athletes, compared to other sources of sport funding.

Download Full-text

Crankshaft of RMP Analysis Based on CAE

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.467-469.1921 ◽

2011 ◽

Vol 467-469 ◽

pp. 1921-1926

Author(s):

Bin Peng

Keyword(s):

High Pressure ◽

Stress Analysis ◽

High Performance ◽

High Reliability ◽

Movement Analysis ◽

Large Power ◽

Main Stress ◽

And Performance ◽

Main Components ◽

Dangerous Condition

Crankshaft is one of the main components in reciprocating mud pump (RMP). Its stress significantly affects reliability and performance of high pressure and large power RMP. In this study CAE stress analysis was actively applied to secure the high reliability of main components and improve the performance of RMP. Through movement analysis and computation for various components of RMP, the load of crankshaft was obtained. By effectively using CAE stress analysis to three kinds of dangerous working conditions of crankshaft, distribution characteristics of the max main stress, the min. main stress and the Mises stress were obtained. The most dangerous condition and position were gotten through stress analysis. The analysis results give theory support for design and development of high pressure, large power, high performance and high reliability RMP.

Download Full-text

Multilevel MPSoC Performance Evaluation: New ISSPT Model

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v5i5.pp975-983 ◽

2015 ◽

Vol 5 (5) ◽

pp. 975

Author(s):

A. Alali ◽

I. Assayad ◽

M. Sadik

Keyword(s):

Performance Evaluation ◽

High Performance ◽

Design Space Exploration ◽

Estimation Error ◽

Performance Estimation ◽

Design Flow ◽

Systems On Chip ◽

And Performance ◽

On Chip ◽

Definition Of

<p>To deploy the enormous hardware resources available in Multi Processor Systems-on-Chip (MPSoC) efficiently, rapidly and accurately, methods of Design Space Exploration (DSE) are needed to evaluate the different design alternatives. In this paper, we present a framework that makes fast simulation and performance evaluation of MPSoC possible early in the design flow, thus reducing the time-to-market. In this framework and within the Transaction Level Modeling (TLM) approach, we present a new definition of ISS level by introducing two complementary modeling sublevels ISST and ISSPT. This later, that we illustrate an arbiter modeling approach that allows a high performance MPSoC communication. A round-robin method is chosen because it is simple, minimizes the communication latency and has an accepted speed-up. Two applications are tested and used to validate our platform: Game of life and JPEG Encoder. The performance of the proposed approach has been analyzed in our platform MPSoC based on multi-MicroBlaze. Simulation results show with ISSPT sublevels gives a high simulation speedup factor of up to 32 with a negligible performance estimation error margin.</p>

Download Full-text