WATT MATTERS MOST? DESIGN SPACE EXPLORATION OF HIGH-PERFORMANCE MICROPROCESSORS FOR POWER-PERFORMANCE EFFICIENCY

2007 ◽  
Vol 16 (03) ◽  
pp. 357-378
Author(s):  
PEDRO TRANCOSO

Computer systems have evolved significantly in the last years leading to high-performance systems. This, however, has come with a cost of large power dissipation. As such, power-awareness has become a major factor in processor design. Therefore, it is important to have a complete understanding of the power and performance behavior of all processor components. In order to achieve this, the current work presents a comprehensive analysis of power-performance efficiency for different high-end microarchitecture configurations using three different workloads: multimedia, scientific, and database. The objectives of this work are: (1) to analyze and compare the power-performance efficiency for different workloads; (2) to present a sensitivity analysis for the microarchitecture parameters in order to identify which ones are more sensitive to changes in terms of power-performance efficiency; and (3) to propose power-performance efficient configurations for each workload. The simulation results show that the multimedia workload is the one achieving the highest efficiency but the database workload is the most sensitive to parameter changes. In addition, the results also show that the parameter sensitivity depends significantly on the workload. While the issue width and clock frequency present very high sensitivity across all workloads (approximately 100%), for the database workload, the first-level instruction cache size shows an even higher sensitivity (149%). The correct configuration of these microarchitecture parameters is essential. A careless configuration of a single parameter from a baseline setup may result in a loss of the power-performance efficiency of up to 99%. Finally, carefully tuning multiple parameters simultaneously may result in gains up to 154% over the power-performance efficiency of the baseline configuration.

2015 ◽  
Vol 2015 ◽  
pp. 1-20
Author(s):  
Gongyu Wang ◽  
Greg Stitt ◽  
Herman Lam ◽  
Alan George

Field-programmable gate arrays (FPGAs) provide a promising technology that can improve performance of many high-performance computing and embedded applications. However, unlike software design tools, the relatively immature state of FPGA tools significantly limits productivity and consequently prevents widespread adoption of the technology. For example, the lengthy design-translate-execute (DTE) process often must be iterated to meet the application requirements. Previous works have enabled model-based, design-space exploration to reduce DTE iterations but are limited by a lack of accurate model-based prediction of key design parameters, the most important of which is clock frequency. In this paper, we present a core-level modeling and design (CMD) methodology that enables modeling of FPGA applications at an abstract level and yet produces accurate predictions of parameters such as clock frequency, resource utilization (i.e., area), and latency. We evaluate CMD’s prediction methods using several high-performance DSP applications on various families of FPGAs and show an average clock-frequency prediction error of 3.6%, with a worst-case error of 20.4%, compared to the best of existing high-level prediction methods, 13.9% average error with 48.2% worst-case error. We also demonstrate how such prediction enables accurate design-space exploration without coding in a hardware-description language (HDL), significantly reducing the total design time.


Author(s):  
Mr.M.V. Sathish ◽  
Mrs. Sailaja

A new architecture of multiplier-andaccumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.


2018 ◽  
Vol 3 (4) ◽  
pp. 55 ◽  
Author(s):  
Aristide Guerriero ◽  
Carlo Varalda ◽  
Maria Piacentini

Resistance training (RT) is considered the most important method to improve the athlete’s strength and rate of force development (RFD). In the last decade, the importance of monitoring velocity during RT has drastically grown, because of an increased availability of linear position transducers (LPT) and inertial measurement units (IMU). The purpose of this review is to analyze the existing literature on testing techniques and performance strategies used to enhance strength and power performance of elite athletes, by monitoring the velocity of resistance training. The authors focus in particular on the level of effort of resistance training defined by velocity; how the loss of velocity correlates with the degree of fatigue and how it can be used to enhance the performance of competitive athletes; the use of LPT as part of the daily routine of the strength and conditioning programs in competitive sport. It is therefore critical for the sports scientists to have a correct understanding of the basic concepts of the velocity-based training and their application to elite sports. The ultimate goal is to give some indications on the velocity-based resistance training integration in the programs of different sports in the high performance environment.


2020 ◽  
Vol 10 (4) ◽  
pp. 37
Author(s):  
Habiba Lahdhiri ◽  
Jordane Lorandel ◽  
Salvatore Monteleone ◽  
Emmanuelle Bourdel ◽  
Maurizio Palesi

The Network-on-chip (NoC) paradigm has been proposed as a promising solution to enable the handling of a high degree of integration in multi-/many-core architectures. Despite their advantages, wired NoC infrastructures are facing several performance issues regarding multi-hop long-distance communications. RF-NoC is an attractive solution offering high performance and multicast/broadcast capabilities. However, managing RF links is a critical aspect that relies on both application-dependent and architectural parameters. This paper proposes a design space exploration framework for OFDMA-based RF-NoC architecture, which takes advantage of both real application benchmarks simulated using Sniper and RF-NoC architecture modeled using Noxim. We adopted the proposed framework to finely configure a routing algorithm, working with real traffic, achieving up to 45% of delay reduction, compared to a wired NoC setup in similar conditions.


2017 ◽  
Vol 32 (2) ◽  
Author(s):  
Mansooreh Razmkhah ◽  
Bahram Sadeghpour Gildeh ◽  
Jafar Ahmadi

AbstractIn industry when a lot of items is sent for inspection, double acceptance sampling plans (DASP) are considered as a way to decide on acceptance or rejection of the lot. If the lot contains items with high sensitivity, then the measuring of quality characteristics is destructive or costly. So we are looking for a method to decide that it has high performance. Using the ranked set sampling (RSS) method will make it stricter and more accurate whether or not to accept a lot. Moreover, it is affordable and will not burden extra costs on the buyer or the producer. In this paper, by using a special type of RSS, with the name of maxima nomination sampling (MNS), we design a DASP with regards to the total loss function. The results indicate that the total loss function, which is acquired by the MNS method, has lower values than the one using the simple random sampling (SRS) method.


Author(s):  
P.Sasi Bala ◽  
S. Raghavendra

In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic.By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 m, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai’s alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.


2008 ◽  
Vol 3 (1) ◽  
pp. 32-38
Author(s):  
Enric Musoll ◽  
Mario Nemirovsky

High-performance single-threaded processors achieve their performance goal partly by relying, among other architectural techniques, on speculation and large on-chip caches. The hardware to support these techniques is usually a large portion of the overall processor real state area, and therefore it consumes a significant amount of power that sometimes is not optimally used toward doing useful work. In this work, we study the intuitive fact that architectures with hardware support for threads are more power efficient than a more traditional single-threaded superscalar architecture. Toward this goal, we have created a model of the power, performance and area of several parallel architectures. This model shows that a parallel architecture can be designed so that (a) it requires less area and power (to reach the same performance), or (b) it achieves better power efficiency and less area (for the same power budget), or (c) it has higher performance and better power efficiency (for the same area constraint), when compared to a single-threaded superscalar architecture.


2020 ◽  
Vol 8 (11) ◽  
pp. 532-537
Author(s):  
Akouete Coffi David ◽  
◽  
Ahounouaïkpe Fifamin Judith ◽  
Hounsou Semako Julien ◽  
Dansou H. Pierre ◽  
...  

This research work entitled Financing of high performance individual sport in Benin aims to analyze on the one hand, the effects of insufficient funding of high performance athletes in Benin on the development of individual sport and support of their elites and, on the other hand, the type of funding that would be best suited to this situation in Benin. It focuses on three sources of funding for sport: public funding, self-funding and other sources of funding. The results of the study show that, on the one hand, the insufficiency of the budgets allocated to high-performance individual sports constitutes in part an obstacle to the development of this type of sport, and on the other hand, that public funding does not favor not the improvement of the performance of high performance individual athletes, compared to other sources of sport funding.


2011 ◽  
Vol 467-469 ◽  
pp. 1921-1926
Author(s):  
Bin Peng

Crankshaft is one of the main components in reciprocating mud pump (RMP). Its stress significantly affects reliability and performance of high pressure and large power RMP. In this study CAE stress analysis was actively applied to secure the high reliability of main components and improve the performance of RMP. Through movement analysis and computation for various components of RMP, the load of crankshaft was obtained. By effectively using CAE stress analysis to three kinds of dangerous working conditions of crankshaft, distribution characteristics of the max main stress, the min. main stress and the Mises stress were obtained. The most dangerous condition and position were gotten through stress analysis. The analysis results give theory support for design and development of high pressure, large power, high performance and high reliability RMP.


Author(s):  
A. Alali ◽  
I. Assayad ◽  
M. Sadik

<p>To deploy the enormous hardware resources available in Multi Processor Systems-on-Chip (MPSoC) efficiently, rapidly and accurately, methods of Design Space Exploration (DSE) are needed to evaluate the different design alternatives. In this paper, we present a framework that makes fast simulation and performance evaluation of MPSoC possible early in the design flow, thus reducing the time-to-market. In this framework and within the Transaction Level Modeling (TLM) approach, we present a new definition of ISS level by introducing two complementary modeling sublevels ISST and ISSPT. This later, that we illustrate an arbiter modeling approach that allows a high performance MPSoC communication. A round-robin method is chosen because it is simple, minimizes the communication latency and has an accepted speed-up. Two applications are tested and used to validate our platform: Game of life and JPEG Encoder. The performance of the proposed approach has been analyzed in our platform MPSoC based on multi-MicroBlaze. Simulation results show with ISSPT sublevels gives a high simulation speedup factor of up to 32 with a negligible performance estimation error margin.</p>


Sign in / Sign up

Export Citation Format

Share Document