performance loss Latest Research Papers

Microarchitectural Exploration of STT-MRAM Last-level Cache Parameters for Energy-efficient Devices

ACM Transactions on Embedded Computing Systems ◽

10.1145/3490391 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-20

Author(s):

Tommaso Marinelli ◽

Jignacio Gómez Pérez ◽

Christian Tenllado ◽

Manu Komalan ◽

Mohit Gupta ◽

...

Keyword(s):

Spin Transfer Torque ◽

Spin Transfer ◽

Added Value ◽

Embedded Devices ◽

Physical Size ◽

Technology Scaling ◽

Performance Loss ◽

Write Buffer ◽

Architectural Exploration ◽

Different Levels

As the technology scaling advances, limitations of traditional memories in terms of density and energy become more evident. Modern caches occupy a large part of a CPU physical size and high static leakage poses a limit to the overall efficiency of the systems, including IoT/edge devices. Several alternatives to CMOS SRAM memories have been studied during the past few decades, some of which already represent a viable replacement for different levels of the cache hierarchy. One of the most promising technologies is the spin-transfer torque magnetic RAM (STT-MRAM), due to its small basic cell design, almost absent static current and non-volatility as an added value. However, nothing comes for free, and designers will have to deal with other limitations, such as the higher latencies and dynamic energy consumption for write operations compared to reads. The goal of this work is to explore several microarchitectural parameters that may overcome some of those drawbacks when using STT-MRAM as last-level cache (LLC) in embedded devices. Such parameters include: number of cache banks, number of miss status handling registers (MSHRs) and write buffer entries, presence of hardware prefetchers. We show that an effective tuning of those parameters may virtually remove any performance loss while saving more than 60% of the LLC energy on average. The analysis is then extended comparing the energy results from calibrated technology models with data obtained with freely available tools, highlighting the importance of using accurate models for architectural exploration.

Performance Evaluation of UAV-Based NOMA Networks with Hardware Impairment

Electronics ◽

10.3390/electronics11010094 ◽

2021 ◽

Vol 11 (1) ◽

pp. 94

Author(s):

Chung Ho Duc ◽

Sang Quang Nguyen ◽

Chi-Bao Le ◽

Ngo Tan Vu Khanh

Keyword(s):

System Performance ◽

Performance Metrics ◽

Signal To Noise Ratio ◽

Ergodic Capacity ◽

High Signal ◽

Performance Loss ◽

Hardware Impairments ◽

Asymptotic Expressions ◽

Aerial Vehicle ◽

Hardware Impairment

In this paper, we evaluate the outage performance of a non-orthogonal multiple access (NOMA)-enabled unmanned aerial vehicle (UAV) where two users on the ground are simultaneously served by a UAV for a spectral efficiency purpose. In practice, hardware impairments at the transceiver cause distortion noise, which results in the performance loss of wireless systems. As a consequence, hardware impairment is an unavoidable factor in the system design process. Hence, we take into account the effects of hardware impairment (HI) on the performance of the proposed system. In this setting, to evaluate the system performance, the closed-form expressions of the outage probability of two NOMA users and the ergodic capacity are derived as well as their asymptotic expressions for a high signal-to-noise ratio (SNR). Finally, based on Monte-Carlo simulations, we verify the analytical expressions and investigate the effects on the main system parameters, i.e., the transmit SNR and level of HI, on the system performance metrics. The results show that the performance for the near NOMA user is better than of that for the far NOMA user in the case of perfect hardware; however, in the case of hardware impairment, an inversion happens at a high transmit power of the UAV in terms of the ergodic capacity.

Comparative analysis of software optimization methods in context of branch predication on GPUs

Российский технологический журнал ◽

10.32362/2500-316x-2021-9-6-7-15 ◽

2021 ◽

Vol 9 (6) ◽

pp. 7-15

Author(s):

I. Yu. Sesin ◽

R. G. Bolbakov

Keyword(s):

Optimization Methods ◽

Time Algorithm ◽

General Purpose ◽

Speculative Execution ◽

Adaptive Optimization ◽

Software Optimization ◽

Performance Loss ◽

Graphical Processing Units ◽

Parallel Data ◽

Graphical Processing

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.

Impact of Uniform Surface Roughness on the Aerodynamic Performance of an Axial Compressor Rotor at Different Rotational Speeds

10.1115/gtindia2021-75987 ◽

2021 ◽

Author(s):

Ashima Malhotra ◽

Shraman Goswami ◽

Pradeep A M

Keyword(s):

Surface Roughness ◽

Numerical Study ◽

Axial Compressor ◽

Aerodynamic Performance ◽

Performance Loss ◽

Compressor Rotor ◽

Multistage Compressor ◽

Design Speed ◽

The Impact ◽

Relative Flow

Abstract The aerodynamic performance of a compressor rotor is known to deteriorate due to surface roughness. It is important to understand this deterioration as it impacts the overall performance of the engine. This paper, therefore, aims to numerically investigate the impact of roughness on the performance of an axial compressor rotor at different rotational speeds. In this numerical study, the simulations are carried out for NASA Rotor37 at 100%, 80%, and 60% of its design speed. with and without roughness on the blade surface. These speeds are chosen because they represent different flow regimes. The front stages of a multistage compressor usually have a supersonic or transonic regime whereas the middle and aft stages have a subsonic regime. Thus, these performance characteristics can give an estimate of the impact on the performance of a multistage compressor. At 100% speed (design speed), the relative flow is supersonic, at 80% of design speed, the relative flow is transonic and at 60% of design speed, the relative flow is subsonic. Detailed flow field investigations are carried out to understand the underlying flow physics. The results indicate that, for the same amount of roughness, the degradation in the performance is maximum at 100% speed where the rotor is supersonic, while the impact is minimum at 60% speed where the rotor is subsonic. Thus, the rotor shock system plays an important role in determining the performance loss due to roughness. It is also observed that the loss increases with increased span for 100% and 80% speeds, but for 60% speed, the loss is almost constant from the hub to the shroud. This is because, with the increased span, the shock strength increases for 100% and 80% speeds, whereas at 60% speed flow is subsonic.

Unsupervised Domain Expansion for Visual Categorization

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3448108 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-24

Author(s):

Jie Wang ◽

Kaibin Tian ◽

Dayong Ding ◽

Gang Yang ◽

Xirong Li

Keyword(s):

Domain Adaptation ◽

Unlabeled Data ◽

Target Domain ◽

Source Domain ◽

Visual Categorization ◽

Performance Loss ◽

Deep Model ◽

Knowledge Distillation ◽

The Cost

Expanding visual categorization into a novel domain without the need of extra annotation has been a long-term interest for multimedia intelligence. Previously, this challenge has been approached by unsupervised domain adaptation (UDA). Given labeled data from a source domain and unlabeled data from a target domain, UDA seeks for a deep representation that is both discriminative and domain-invariant. While UDA focuses on the target domain, we argue that the performance on both source and target domains matters, as in practice which domain a test example comes from is unknown. In this article, we extend UDA by proposing a new task called unsupervised domain expansion (UDE), which aims to adapt a deep model for the target domain with its unlabeled data, meanwhile maintaining the model’s performance on the source domain. We propose Knowledge Distillation Domain Expansion (KDDE) as a general method for the UDE task. Its domain-adaptation module can be instantiated with any existing model. We develop a knowledge distillation-based learning mechanism, enabling KDDE to optimize a single objective wherein the source and target domains are equally treated. Extensive experiments on two major benchmarks, i.e., Office-Home and DomainNet, show that KDDE compares favorably against four competitive baselines, i.e., DDC, DANN, DAAN, and CDAN, for both UDA and UDE tasks. Our study also reveals that the current UDA models improve their performance on the target domain at the cost of noticeable performance loss on the source domain.

Investigation of Photometric Flicker Phenomenon Effect on the Perception Level of Office Workers in Different Age Groups

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38921 ◽

2021 ◽

Vol 9 (11) ◽

pp. 889-893

Author(s):

Cenk Yavuz

Keyword(s):

Age Groups ◽

Office Workers ◽

Direct Result ◽

Due Diligence ◽

Artificial Lighting ◽

Performance Loss ◽

Interior Spaces ◽

Lighting Conditions ◽

Attention And Perception ◽

The Relationship

Abstract: Today, under the conditions where the number of office workers and artificial lighting applications have increased, although the effects of the Photometric Flicker phenomenon are serious, it is an issue that has not been understood in detail and people are not aware of it. Photometric Flicker phenomenon, which is a direct result of using ballasts or drivers with low power factor and lacking the necessary filtering features; It causes results such as decreased visual performance, loss of attention and perception. Considering that the conversion of LED luminaires is still not completed in many office buildings in the country, it is seen as an important requirement to investigate the Flicker effect in interior spaces that are considered to offer similar lighting levels and conditions, and to make a concrete due diligence by correlating this with the average age of office workers. For this reason, in this study, various tests and experiments were carried out with volunteer participants aged 18-30, 31-45 and 46 years of age and older without any significant vision problems, and the outputs of these studies were aimed to shed light on the relationship between age and lighting conditions. Keywords: Photometric Flicker, Interior lighting, Age and lighting relationship, Disruptive effects in lighting

Comparative Study of Dynamic Stall between an Aircraft Airfoil and a Wind Turbine Airfoil in an Air–Particle Flow

Applied Sciences ◽

10.3390/app112210920 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10920

Author(s):

Junjun Jin ◽

Zhiliang Lu ◽

Tongqing Guo ◽

Di Zhou ◽

Qiaozhong Li

Keyword(s):

Wind Turbine ◽

Lift Coefficient ◽

Angle Of Attack ◽

Dynamic Stall ◽

Performance Loss ◽

Naca0012 Airfoil ◽

Reduced Frequency ◽

Clean Air ◽

Wind Turbine Airfoil ◽

Maximum Increment

Dynamic stall in clean air flow has been well studied, but its exploration in air–particle (air–raindrop or air–sand) flow is still lacking. The aerodynamic performance loss of aircraft (NACA0012) and wind turbine (S809) airfoils and their differences during the hysteresis loop at different pitching parameters are also poorly understood. As shown in this paper, the reduced frequency has little effect on the value of the maximum lift coefficient increment caused by particles, but a larger one can enhance the hysteresis effect and drag the angle of attack, at which the maximum increment is obtained, from the up stroke to the down stroke. The large lift coefficient increments of two airfoils and their difference also have a similar change trend with the reduced frequency. Compared to that of NACA0012 airfoil, the increments of S809 airfoil are obviously greater at three mean angles of attack, especially at 8°, which is the commonly used operating angle. In addition, the angle of attack, at which the maximum lift coefficient is obtained, can be significantly changed by particles in two regions: one is under the effect of deep stall, the other is under the effect of light stall at a low, reduced frequency.

Covert Information Mapped Spatial and Directional Modulation toward Secure Wireless Transmission

Sensors ◽

10.3390/s21227646 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7646

Author(s):

Jie Tian ◽

Hao Chen ◽

Zhigang Wang ◽

Xianhua Shi ◽

Zhengyu Ji ◽

...

Keyword(s):

The Other ◽

Spatial Modulation ◽

Extreme Condition ◽

Theoretical Derivation ◽

Performance Loss ◽

Legitimate User ◽

Other Hand ◽

The One ◽

The Cost ◽

Directional Modulation

Recently, the concept of spatial and direction modulation (SDM) has been developed to reap the advantages of both spatial modulation (SM) and directional modulation (DM). On the one hand, DM ensures the transmission security at the expected direction. On the other hand, the structure of SM-aided distributed receivers can enhance the security even if the eavesdropper is located in the same direction as the legitimate receiver. However, the above advantages are achieved based on the assumption that the eavesdropper is not equipped with distributed receivers. On the other hand, the information security can no longer be guaranteed when the eavesdropper is also equipped with distributed receivers. To alleviate this problem, we considered a joint design of SDM and covert information mapping (CIM) in order to conceive of a more robust structure of CIM-SDM. Furthermore, both the detection performances at the eavesdropper and the legitimate user were quantified through theoretical derivation. In general, both the analysis and simulation results supported that the proposed CIM-SDM structure provides more robust secure performance compared to the original SDM, even if the extreme condition of distributed receivers at the eavesdropper is considered, at the cost of moderate performance loss at the legitimate user.

Transport Triggered Array Processor for Vision Applications: Near-threshold Performance Loss Compensation Through Inherent Parallelism of Vision Array Processors

10.36227/techrxiv.16918081 ◽

2021 ◽

Author(s):

Mehdi Safarpour

Keyword(s):

Energy Efficiency ◽

Low Voltage ◽

Local Binary Patterns ◽

Threshold Region ◽

Efficient Computation ◽

Clock Frequency ◽

Low Level ◽

Performance Loss ◽

Binary Descriptors ◽

Near Threshold

<div>Operating at reduced voltages promises substantial energy efficiency improvement, however the downside is significant down-scaling of clock frequency. This paper propose vision chips as excellent fit for low-voltage operation. Low-level sensory data processing in many Internet-of-Things (IoT) devices pursue energy efficiency by utilizing sleep modes or slowing the clocking to the minimum. To curb the share of stand-by power dissipation in those designs, near-threshold/sub-threshold operational points or ultra-low-leakage processes in fabrication are employed. Those limit the clocking rates significantly, reducing the computing throughputs of individual processing cores. In this contribution we explore compensating for the performance loss of operating in near-threshold region ($V_{dd}=$0.6V) through massive parallelization. Benefits of near-threshold operation and massive parallelism are optimum energy consumption per instruction operation and minimized memory round-trips, respectively. The Processing Elements (PE) of the design are based on Transport Triggered Architecture. The fine grained programmable parallel solution allows for fast and efficient computation of learnable low-level features (e.g. local binary descriptors and convolutions). Other operations, including Max-pooling have also been implemented. The programmable design achieves excellent energy efficiency for Local Binary Patterns computations. </div><div>Our results demonstrates that the inherent properties of chip processor and vision applications allow voltage and clock frequency aggressively without having to compromise performance. </div>

Transport Triggered Array Processor for Vision Applications: Near-threshold Performance Loss Compensation Through Inherent Parallelism of Vision Array Processors

10.36227/techrxiv.16918081.v1 ◽

2021 ◽

Author(s):

Mehdi Safarpour

Keyword(s):

Energy Efficiency ◽

Low Voltage ◽

Local Binary Patterns ◽

Threshold Region ◽

Efficient Computation ◽

Clock Frequency ◽

Low Level ◽

Performance Loss ◽

Binary Descriptors ◽

Near Threshold

<div>Operating at reduced voltages promises substantial energy efficiency improvement, however the downside is significant down-scaling of clock frequency. This paper propose vision chips as excellent fit for low-voltage operation. Low-level sensory data processing in many Internet-of-Things (IoT) devices pursue energy efficiency by utilizing sleep modes or slowing the clocking to the minimum. To curb the share of stand-by power dissipation in those designs, near-threshold/sub-threshold operational points or ultra-low-leakage processes in fabrication are employed. Those limit the clocking rates significantly, reducing the computing throughputs of individual processing cores. In this contribution we explore compensating for the performance loss of operating in near-threshold region ($V_{dd}=$0.6V) through massive parallelization. Benefits of near-threshold operation and massive parallelism are optimum energy consumption per instruction operation and minimized memory round-trips, respectively. The Processing Elements (PE) of the design are based on Transport Triggered Architecture. The fine grained programmable parallel solution allows for fast and efficient computation of learnable low-level features (e.g. local binary descriptors and convolutions). Other operations, including Max-pooling have also been implemented. The programmable design achieves excellent energy efficiency for Local Binary Patterns computations. </div><div>Our results demonstrates that the inherent properties of chip processor and vision applications allow voltage and clock frequency aggressively without having to compromise performance. </div>

performance loss
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Microarchitectural Exploration of STT-MRAM Last-level Cache Parameters for Energy-efficient Devices

Performance Evaluation of UAV-Based NOMA Networks with Hardware Impairment

Comparative analysis of software optimization methods in context of branch predication on GPUs

Impact of Uniform Surface Roughness on the Aerodynamic Performance of an Axial Compressor Rotor at Different Rotational Speeds

Unsupervised Domain Expansion for Visual Categorization

Investigation of Photometric Flicker Phenomenon Effect on the Perception Level of Office Workers in Different Age Groups

Comparative Study of Dynamic Stall between an Aircraft Airfoil and a Wind Turbine Airfoil in an Air–Particle Flow

Covert Information Mapped Spatial and Directional Modulation toward Secure Wireless Transmission

Transport Triggered Array Processor for Vision Applications: Near-threshold Performance Loss Compensation Through Inherent Parallelism of Vision Array Processors

Transport Triggered Array Processor for Vision Applications: Near-threshold Performance Loss Compensation Through Inherent Parallelism of Vision Array Processors

Export Citation Format

performance lossRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Microarchitectural Exploration of STT-MRAM Last-level Cache Parameters for Energy-efficient Devices

Performance Evaluation of UAV-Based NOMA Networks with Hardware Impairment

Comparative analysis of software optimization methods in context of branch predication on GPUs

Impact of Uniform Surface Roughness on the Aerodynamic Performance of an Axial Compressor Rotor at Different Rotational Speeds

Unsupervised Domain Expansion for Visual Categorization

Investigation of Photometric Flicker Phenomenon Effect on the Perception Level of Office Workers in Different Age Groups

Comparative Study of Dynamic Stall between an Aircraft Airfoil and a Wind Turbine Airfoil in an Air–Particle Flow

Covert Information Mapped Spatial and Directional Modulation toward Secure Wireless Transmission

Transport Triggered Array Processor for Vision Applications: Near-threshold Performance Loss Compensation Through Inherent Parallelism of Vision Array Processors

Transport Triggered Array Processor for Vision Applications: Near-threshold Performance Loss Compensation Through Inherent Parallelism of Vision Array Processors

performance loss
Recently Published Documents