scholarly journals Optimization Techniques for Verification of Out-of-Order Execution Machines

2010 ◽  
Vol 2010 ◽  
pp. 1-7
Author(s):  
Sudarshan K. Srinivasan

We develop two optimization techniques,flush-machineand collapsed flushing, to improve the efficiency of automatic refinement-abased verification of out-of-order (ooo) processor models. Refinement is a notion of equivalence that can be used to check that an ooo processor correctly implements all behaviors of its instruction set architecture (ISA), including deadlock detection. The optimization techniques work by reducing the computational complexity of the refinement map, a function central to refinement proofs that maps ooo processor model states to ISA states. This has a direct impact on the efficiency of verification, which is studied using 23 ooo processor models.Flush-machine, is a novel optimization technique. Collapsed flushing has been employed previously in the context of in-order processors. We show how to apply collapsed flushing for ooo processor models. Using both the optimizations together, we can handle 9 ooo models that could not be verified using standard flushing. Also, the optimizations provided a speed up of 23.29 over standard flushing.

Author(s):  
Dae-Hwan Kim

Thumb-2 is the most recent instruction set architecture for ARM processors which are one of the most widely used embedded processors. In this paper, two extensions are proposed to improve the performance of the Thumb-2 instruction set architecture, which are addressing mode extensions and sign/zero extensions combined with data processing instructions. To speed up access to an element of an aggregated data, the proposed approach first introduces three new addressing modes for load and store instructions. They are register-plus-immediate offset addressing mode, negative register offset addressing mode, and post-increment register offset addressing mode. Register-plus-immediate offset addressing mode permits two offsets and negative register offset allows offset to be a negative value of a register content. Post-increment register offset mode automatically modifies the offset address after the memory operation. The second is the sign/zero extension combined with a data processing instruction which allows the result of a data processing operation to be sign/zero extended to accelerate a type conversion. Several least frequently used instructions are reduced to provide the encoding space for the new extensions. Experiments show that the proposed approach improves performance by an average of 8.6% when compared to the Thumb-2 instruction set architecture.


2021 ◽  
Vol 13 (3) ◽  
pp. 1274
Author(s):  
Loau Al-Bahrani ◽  
Mehdi Seyedmahmoudian ◽  
Ben Horan ◽  
Alex Stojcevski

Few non-traditional optimization techniques are applied to the dynamic economic dispatch (DED) of large-scale thermal power units (TPUs), e.g., 1000 TPUs, that consider the effects of valve-point loading with ramp-rate limitations. This is a complicated multiple mode problem. In this investigation, a novel optimization technique, namely, a multi-gradient particle swarm optimization (MG-PSO) algorithm with two stages for exploring and exploiting the search space area, is employed as an optimization tool. The M particles (explorers) in the first stage are used to explore new neighborhoods, whereas the M particles (exploiters) in the second stage are used to exploit the best neighborhood. The M particles’ negative gradient variation in both stages causes the equilibrium between the global and local search space capabilities. This algorithm’s authentication is demonstrated on five medium-scale to very large-scale power systems. The MG-PSO algorithm effectively reduces the difficulty of handling the large-scale DED problem, and simulation results confirm this algorithm’s suitability for such a complicated multi-objective problem at varying fitness performance measures and consistency. This algorithm is also applied to estimate the required generation in 24 h to meet load demand changes. This investigation provides useful technical references for economic dispatch operators to update their power system programs in order to achieve economic benefits.


2021 ◽  
Vol 7 (4) ◽  
pp. 64
Author(s):  
Tanguy Ophoff ◽  
Cédric Gullentops ◽  
Kristof Van Beeck ◽  
Toon Goedemé

Object detection models are usually trained and evaluated on highly complicated, challenging academic datasets, which results in deep networks requiring lots of computations. However, a lot of operational use-cases consist of more constrained situations: they have a limited number of classes to be detected, less intra-class variance, less lighting and background variance, constrained or even fixed camera viewpoints, etc. In these cases, we hypothesize that smaller networks could be used without deteriorating the accuracy. However, there are multiple reasons why this does not happen in practice. Firstly, overparameterized networks tend to learn better, and secondly, transfer learning is usually used to reduce the necessary amount of training data. In this paper, we investigate how much we can reduce the computational complexity of a standard object detection network in such constrained object detection problems. As a case study, we focus on a well-known single-shot object detector, YoloV2, and combine three different techniques to reduce the computational complexity of the model without reducing its accuracy on our target dataset. To investigate the influence of the problem complexity, we compare two datasets: a prototypical academic (Pascal VOC) and a real-life operational (LWIR person detection) dataset. The three optimization steps we exploited are: swapping all the convolutions for depth-wise separable convolutions, perform pruning and use weight quantization. The results of our case study indeed substantiate our hypothesis that the more constrained a problem is, the more the network can be optimized. On the constrained operational dataset, combining these optimization techniques allowed us to reduce the computational complexity with a factor of 349, as compared to only a factor 9.8 on the academic dataset. When running a benchmark on an Nvidia Jetson AGX Xavier, our fastest model runs more than 15 times faster than the original YoloV2 model, whilst increasing the accuracy by 5% Average Precision (AP).


2021 ◽  
Vol 13 (12) ◽  
pp. 6644
Author(s):  
Ali Selim ◽  
Salah Kamel ◽  
Amal A. Mohamed ◽  
Ehab E. Elattar

In recent years, the integration of distributed generators (DGs) in radial distribution systems (RDS) has received considerable attention in power system research. The major purpose of DG integration is to decrease the power losses and improve the voltage profiles that directly lead to improving the overall efficiency of the power system. Therefore, this paper proposes a hybrid optimization technique based on analytical and metaheuristic algorithms for optimal DG allocation in RDS. In the proposed technique, the loss sensitivity factor (LSF) is utilized to reduce the search space of the DG locations, while the analytical technique is used to calculate initial DG sizes based on a mathematical formulation. Then, a metaheuristic sine cosine algorithm (SCA) is applied to identify the optimal DG allocation based on the LSF and analytical techniques instead of using random initialization. To prove the superiority and high performance of the proposed hybrid technique, two standard RDSs, IEEE 33-bus and 69-bus, are considered. Additionally, a comparison between the proposed techniques, standard SCA, and other existing optimization techniques is carried out. The main findings confirmed the enhancement in the convergence of the proposed technique compared with the standard SCA and the ability to allocate multiple DGs in RDS.


2016 ◽  
Vol 2016 ◽  
pp. 1-9
Author(s):  
Fayiz Abu Khadra ◽  
Jaber Abu Qudeiri ◽  
Mohammed Alkahtani

A control methodology based on a nonlinear control algorithm and optimization technique is presented in this paper. A controller called “the robust integral of the sign of the error” (in short, RISE) is applied to control chaotic systems. The optimum RISE controller parameters are obtained via genetic algorithm optimization techniques. RISE control methodology is implemented on two chaotic systems, namely, the Duffing-Holms and Van der Pol systems. Numerical simulations showed the good performance of the optimized RISE controller in tracking task and its ability to ensure robustness with respect to bounded external disturbances.


2022 ◽  
Vol 2022 ◽  
pp. 1-18
Author(s):  
Dereje Tekilu Aseffa ◽  
Harish Kalla ◽  
Satyasis Mishra

Money transactions can be performed by automated self-service machines like ATMs for money deposits and withdrawals, banknote counters and coin counters, automatic vending machines, and automatic smart card charging machines. There are four important functions such as banknote recognition, counterfeit banknote detection, serial number recognition, and fitness classification which are furnished with these devices. Therefore, we need a robust system that can recognize banknotes and classify them into denominations that can be used in these automated machines. However, the most widely available banknote detectors are hardware systems that use optical and magnetic sensors to detect and validate banknotes. These banknote detectors are usually designed for specific country banknotes. Reprogramming such a system to detect banknotes is very difficult. In addition, researchers have developed banknote recognition systems using deep learning artificial intelligence technology like CNN and R-CNN. However, in these systems, dataset used for training is relatively small, and the accuracy of banknote recognition is found smaller. The existing systems also do not include implementation and its development using embedded systems. In this research work, we collected various Ethiopian currencies with different ages and conditions and applied various optimization techniques for CNN architects to identify the fake notes. Experimental analysis has been demonstrated with different models of CNN such as InceptionV3, MobileNetV2, XceptionNet, and ResNet50. MobileNetV2 with RMSProp optimization technique with batch size 32 is found to be a robust and reliable Ethiopian banknote detector and achieved superior accuracy of 96.4% in comparison to other CNN models. Selected model MobileNetV2 with RMSProp optimization has been implemented through an embedded platform by utilizing Raspberry Pi 3 B+ and other peripherals. Further, real-time identification of fake notes in a Web-based user interface (UI) has also been proposed in the research.


2021 ◽  
Author(s):  
Chinmay Shah ◽  
Richard Wies

The conventional power distribution network is being transformed drastically due to high penetration of renewable energy sources (RES) and energy storage. The optimal scheduling and dispatch is important to better harness the energy from intermittent RES. Traditional centralized optimization techniques limit the size of the problem and hence distributed techniques are adopted. The distributed optimization technique partitions the power distribution network into sub-networks which solves the local sub problem and exchanges information with the neighboring sub-networks for the global update. This paper presents an adaptive spectral graph partitioning algorithm based on vertex migration while maintaining computational load balanced for synchronization, active power balance and sub-network resiliency. The parameters that define the resiliency metrics of power distribution networks are discussed and leveraged for better operation of sub-networks in grid connected mode as well as islanded mode. The adaptive partition of the IEEE 123-bus network into resilient sub-networks is demonstrated in this paper.


Sign in / Sign up

Export Citation Format

Share Document