Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization

2021 ◽  
Author(s):  
Qi Zhang ◽  
Jiaqiao Hu

Many systems arising in applications from engineering design, manufacturing, and healthcare require the use of simulation optimization (SO) techniques to improve their performance. In “Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization,” Q. Zhang and J. Hu propose a randomized approach that integrates ideas from actor-critic reinforcement learning within a class of adaptive search algorithms for solving SO problems. The approach fully retains the previous simulation data and incorporates them into an approximation architecture to exploit knowledge of the objective function in searching for improved solutions. The authors provide a finite-time analysis for the method when only a single simulation observation is collected at each iteration. The method works well on a diverse set of benchmark problems and has the potential to yield good performance for complex problems using expensive simulation experiments for performance evaluation.

2018 ◽  
Vol 66 (6) ◽  
pp. 1713-1727 ◽  
Author(s):  
Seksan Kiatsupaibul ◽  
Robert L. Smith ◽  
Zelda B. Zabinsky

2006 ◽  
Vol 16 (07) ◽  
pp. 2081-2091 ◽  
Author(s):  
GEORGE D. MAGOULAS ◽  
ARISTOKLIS ANASTASIADIS

This paper explores the use of the nonextensive q-distribution in the context of adaptive stochastic searching. The proposed approach consists of generating the "probability" of moving from one point of the search space to another through a probability distribution characterized by the q entropic index of the nonextensive entropy. The potential benefits of this technique are investigated by incorporating it in two different adaptive search algorithmic models to create new modifications of the diffusion method and the particle swarm optimizer. The performance of the modified search algorithms is evaluated in a number of nonlinear optimization and neural network training benchmark problems.


Author(s):  
Carlos D. Paternina-Arboleda ◽  
Jairo R. Montoya-Torres ◽  
Aldo Fabregas-Ariza

2018 ◽  
Vol 176 ◽  
pp. 01034
Author(s):  
Chengxin Li ◽  
Jing Peng ◽  
Lv Zhicheng ◽  
Mengli Wang ◽  
Gang Ou

In the positioning process of GPS, the linear least squares algorithm and Kalman filtering algorithm are widely used but still have shortcomings. Application of extreme learning machine in this area is proposed in this paper, which breaks through the limitations of the traditional method of positioning based on mathematical models. Two simulation experiments of ELM in GPS positioning process are presented in this paper while the latter is a supplement to the former. Each one contains three phases, including simulation data generation, network training and network prediction, each of which is considered carefully. The feasibility of extreme learning machine is verified through experimental simulation. A more accurate positioning result can be obtained.


Author(s):  
M. Jalali Varnamkhasti

The premature convergence is the essential problem in genetic algorithms and it is strongly related to the loss of genetic diversity of the population. In this study, a new sexual selection mechanism which utilizing mate chromosome during selection proposed and then technique focuses on selecting and controlling the genetic operators by applying the fuzzy logic controller. Computational experiments are conducted on the proposed techniques and the results are compared with some other operators, heuristic and local search algorithms commonly used for solving benchmark problems published in the literature.


2021 ◽  
Vol 2113 (1) ◽  
pp. 012030
Author(s):  
Jing Li ◽  
Yanyang Liu ◽  
Xianguo Qing ◽  
Kai Xiao ◽  
Ying Zhang ◽  
...  

Abstract The nuclear reactor control system plays a crucial role in the operation of nuclear power plants. The coordinated control of power control and steam generator level control has become one of the most important control problems in these systems. In this paper, we propose a mathematical model of the coordinated control system, and then transform it into a reinforcement learning model and develop a deep reinforcement learning control algorithm so-called DDPG algorithm to solve the problem. Through simulation experiments, our proposed algorithm has shown an extremely remarkable control performance.


2014 ◽  
Vol 41 (10) ◽  
pp. 4939-4949 ◽  
Author(s):  
João Paulo Queiroz dos Santos ◽  
Jorge Dantas de Melo ◽  
Adrião Dória Duarte Neto ◽  
Daniel Aloise

2022 ◽  
Vol 6 (1) ◽  
pp. 1-25
Author(s):  
Fang-Chieh Chou ◽  
Alben Rome Bagabaldo ◽  
Alexandre M. Bayen

This study focuses on the comprehensive investigation of stop-and-go waves appearing in closed-circuit ring road traffic wherein we evaluate various longitudinal dynamical models for vehicles. It is known that the behavior of human-driven vehicles, with other traffic elements such as density held constant, could stimulate stop-and-go waves, which do not dissipate on the circuit ring road. Stop-and-go waves can be dissipated by adding automated vehicles (AVs) to the ring. Thorough investigations of the performance of AV longitudinal control algorithms were carried out in Flow, which is an integrated platform for reinforcement learning on traffic control. Ten AV algorithms presented in the literature are evaluated. For each AV algorithm, experiments are carried out by varying distributions and penetration rates of AVs. Two different distributions of AVs are studied. For the first distribution scenario, AVs are placed consecutively. Penetration rates are varied from 1 AV (5%) to all AVs (100%). For the second distribution scenario, AVs are placed with even distribution of human-driven vehicles in between any two AVs. In this scenario, penetration rates are varied from 2 AVs (10%) to 11 AVs (50%). Multiple runs (10 runs) are simulated to average out the randomness in the results. From more than 3,000 simulation experiments, we investigated how AV algorithms perform differently with varying distributions and penetration rates while all AV algorithms remained fixed under all distributions and penetration rates. Time to stabilize, maximum headway, vehicle miles traveled, and fuel economy are used to evaluate their performance. Using these metrics, we find that the traffic condition improvement is not necessarily dependent on the distribution for most of the AV controllers, particularly when no cooperation among AVs is considered. Traffic condition is generally improved with a higher AV penetration rate with only one of the AV algorithms showing a contrary trend. Among all AV algorithms in this study, the reinforcement learning controller shows the most consistent improvement under all distributions and penetration rates.


Sign in / Sign up

Export Citation Format

Share Document