software optimization
Recently Published Documents


TOTAL DOCUMENTS

125
(FIVE YEARS 38)

H-INDEX

9
(FIVE YEARS 4)

Author(s):  
I. Yu. Sesin ◽  
R. G. Bolbakov

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.


TEM Journal ◽  
2021 ◽  
pp. 2001-2006
Author(s):  
Syafii Syafii ◽  
Pinto Anugrah ◽  
Heru Dibyo Laksono ◽  
Herris Yamashika

This paper presents the economic feasibility of hybrid microgrid power system for three remote islands of Sumatra, Indonesia. The microgrid system simulated and analysed using Homer Pro software. Optimization results showed that the combination of photovoltaic (PV), diesel generation (G) and batteries (Batt) for microgrid power system in Mandeh and Lagundri Island area were the most economical configuration. Meanwhile, for Mentawai area, the combination of PV, Wind Turbine (WT), G, Batt was the most optimal since it has higher wind speed then the other two areas. The Mandeh area has the highest solar radiation compared to the other two areas, resulting in the lowest CoE of $0.096/kWh as well as the lowest investment and operational costs. For the fixed PV 100 kW scenario, the optimal configuration is obtained with 86 kW supplied by WT for the Lagundri location, and 67 kW supplied by WT for the Mentawai area, while the WT installation area is not recommended for Mandeh location. The power management analysis showed that the average and patterns of weather parameters including solar radiation and wind speed effect both PV and Wind electrical power production.


2021 ◽  
Vol 897 (1) ◽  
pp. 012015
Author(s):  
Ronald Ayala Ramírez ◽  
Javier Tenesaca Chacaguasay ◽  
Juan Lata García

Abstract Recently, the idea of hybrid power systems (HES) has attracted interest for the electrification of isolated or energy efficient areas. This document examines the modelling and optimal dimensions of a hybrid microgrid using different dispatch strategies. The sizing of the HES components such as Photovoltaic panels, Batteries, Inverter, a Diesel generator has been optimized by three strategies: (i) load tracking, (ii) cycle load, and (iii) combined dispatch. The location of the case study is in a rural community in Ecuador whose load profile is 17 kW. By utilizing HOMER software, optimization for the HES was achieved with the Combined Dispatch strategy (CD) which presented the minimum levels in the net annual cost (NPC), initial capital, levelized cost of energy (LCOE) of $ 90,073.10, $ 21,208 and $ 0.2016 / kWh, respectively. The conclusions offer a guide to consider the resources and generation combination essential for the optimal operation of an island microgrid with different dispatch scenarios.


Author(s):  
Constantino Álvarez Casado ◽  
Miguel Bordallo López

AbstractFace alignment is a crucial component in most face analysis systems. It focuses on identifying the location of several keypoints of the human faces in images or videos. Although several methods and models are available to developers in popular computer vision libraries, they still struggle with challenges such as insufficient illumination, extreme head poses, or occlusions, especially when they are constrained by the needs of real-time applications. Throughout this article, we propose a set of training strategies and implementations based on data augmentation, software optimization techniques that help in improving a large variety of models belonging to several real-time algorithms for face alignment. We propose an extended set of evaluation metrics that allow novel evaluations to mitigate the typical problems found in real-time tracking contexts. The experimental results show that the generated models using our proposed techniques are faster, smaller, more accurate, more robust in specific challenging conditions and smoother in tracking systems. In addition, the training strategy shows to be applicable across different types of devices and algorithms, making them versatile in both academic and industrial uses.


2021 ◽  
Vol 13 (8) ◽  
pp. 4324
Author(s):  
Young Beom Kim ◽  
Taek-Young Youn ◽  
Seog Chung Seo

Since the Keccak algorithm was selected by the US National Institute of Standards and Technology (NIST) as the standard SHA-3 hash algorithm for replacing the currently used SHA-2 algorithm in 2015, various optimization methods have been studied in parallel and hardware environments. However, in a software environment, the SHA-3 algorithm is much slower than the existing SHA-2 family; therefore, the use of the SHA-3 algorithm is low in a limited environment using embedded devices such as a Wireless Sensor Networks (WSN) enviornment. In this article, we propose a software optimization method that can be used generally to break through the speed limit of SHA-3. We combine the θ, π, and ρ processes into one, reducing memory access to the internal state more efficiently than conventional software methods. In addition, we present a new SHA-3 implementation for the proposed method in the most constrained environment, the 8-bit AVR microcontroller. This new implementation method, which we call the chaining optimization methodology, implicitly performs the π process of the f-function while minimizing memory access to the internal state of SHA-3. Through this, it achieves up to 26.1% performance improvement compared to the previous implementation in an AVR microcontroller and reduces the performance gap with the SHA-2 family to the maximum. Finally, we apply our SHA-3 implementation in Hash_Deterministic Random Bit Generator (Hash_DRBG), one of the upper algorithms of a hash function, to prove the applicability of our chaining optimization methodology on 8-bit AVR MCUs.


Author(s):  
Chi-Ming Marvin Chung ◽  
Vincent Hwang ◽  
Matthias J. Kannwischer ◽  
Gregor Seiler ◽  
Cheng-Jhih Shih ◽  
...  

In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial multiplication, while on Intel we use two 16-bit NTT-based polynomial multiplications and combine the products using the Chinese Remainder Theorem (CRT).For Saber, the performance gain is particularly pronounced. On Cortex-M4, the Saber NTT-based matrix-vector multiplication is 61% faster than the Toom–Cook multiplication resulting in 22% fewer cycles for Saber encapsulation. For NTRU, the speed-up is less impressive, but still NTT-based multiplication performs better than Toom–Cook for all parameter sets on Cortex-M4. The NTT-based polynomial multiplication for NTRU-HRSS is 10% faster than Toom–Cook which results in a 6% cost reduction for encapsulation. On AVX2, we obtain speed-ups for three out of four NTRU parameter sets.As a further illustration, we also include code for AVX2 and Cortex-M4 for the Chinese Association for Cryptologic Research competition award winner LAC (also a NIST round 2 candidate) which outperforms existing code.


Sign in / Sign up

Export Citation Format

Share Document