q learning
Recently Published Documents





Automatica ◽  
2022 ◽  
Vol 136 ◽  
pp. 110076
Jinna Li ◽  
Zhenfei Xiao ◽  
Jialu Fan ◽  
Tianyou Chai ◽  
Frank L. Lewis

Prince Nathan S

Abstract: Travelling Salesmen problem is a very popular problem in the world of computer programming. It deals with the optimization of algorithms and an ever changing scenario as it gets more and more complex as the number of variables goes on increasing. The solutions which exist for this problem are optimal for a small and definite number of cases. One cannot take into consideration of the various factors which are included when this specific problem is tried to be solved for the real world where things change continuously. There is a need to adapt to these changes and find optimized solutions as the application goes on. The ability to adapt to any kind of data, whether static or ever-changing, understand and solve it is a quality that is shown by Machine Learning algorithms. As advances in Machine Learning take place, there has been quite a good amount of research for how to solve NP-hard problems using Machine Learning. This reportis a survey to understand what types of machine algorithms can be used to solve with TSP. Different types of approaches like Ant Colony Optimization and Q-learning are explored and compared. Ant Colony Optimization uses the concept of ants following pheromone levels which lets them know where the most amount of food is. This is widely used for TSP problems where the path is with the most pheromone is chosen. Q-Learning is supposed to use the concept of awarding an agent when taking the right action for a state it is in and compounding those specific rewards. This is very much based on the exploiting concept where the agent keeps on learning onits own to maximize its own reward. This can be used for TSP where an agentwill be rewarded for having a short path and will be rewarded more if the path chosen is the shortest. Keywords: LINEAR REGRESSION, LASSO REGRESSION, RIDGE REGRESSION, DECISION TREE REGRESSOR, MACHINE LEARNING, HYPERPARAMETER TUNING, DATA ANALYSIS

Energies ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 614
Zhenhuan Ding ◽  
Xiaoge Huang ◽  
Zhao Liu

Voltage regulation in distribution networks encounters a challenge of handling uncertainties caused by the high penetration of photovoltaics (PV). This research proposes an active exploration (AE) method based on reinforcement learning (RL) to respond to the uncertainties by regulating the voltage of a distribution network with battery energy storage systems (BESS). The proposed method integrates engineering knowledge to accelerate the training process of RL. The engineering knowledge is the chance-constrained optimization. We formulate the problem in a chance-constrained optimization with a linear load flow approximation. The optimization results are used to guide the action selection of the exploration for improving training efficiency and reducing the conserveness characteristic. The comparison of methods focuses on how BESSs are used, training efficiency, and robustness under varying uncertainties and BESS sizes. We implement the proposed algorithm, a chance-constrained optimization, and a traditional Q-learning in the IEEE 13 Node Test Feeder. Our evaluation shows that the proposed AE method has a better response to the training efficiency compared to traditional Q-learning. Meanwhile, the proposed method has advantages in BESS usage in conserveness compared to the chance-constrained optimization.

2022 ◽  
Vol 11 (1) ◽  
pp. 66
Shenghua Xu ◽  
Yang Gu ◽  
Xiaoyan Li ◽  
Cai Chen ◽  
Yingyi Hu ◽  

The internal structure of buildings is becoming increasingly complex. Providing a scientific and reasonable evacuation route for trapped persons in a complex indoor environment is important for reducing casualties and property losses. In emergency and disaster relief environments, indoor path planning has great uncertainty and higher safety requirements. Q-learning is a value-based reinforcement learning algorithm that can complete path planning tasks through autonomous learning without establishing mathematical models and environmental maps. Therefore, we propose an indoor emergency path planning method based on the Q-learning optimization algorithm. First, a grid environment model is established. The discount rate of the exploration factor is used to optimize the Q-learning algorithm, and the exploration factor in the ε-greedy strategy is dynamically adjusted before selecting random actions to accelerate the convergence of the Q-learning algorithm in a large-scale grid environment. An indoor emergency path planning experiment based on the Q-learning optimization algorithm was carried out using simulated data and real indoor environment data. The proposed Q-learning optimization algorithm basically converges after 500 iterative learning rounds, which is nearly 2000 rounds higher than the convergence rate of the Q-learning algorithm. The SASRA algorithm has no obvious convergence trend in 5000 iterations of learning. The results show that the proposed Q-learning optimization algorithm is superior to the SARSA algorithm and the classic Q-learning algorithm in terms of solving time and convergence speed when planning the shortest path in a grid environment. The convergence speed of the proposed Q- learning optimization algorithm is approximately five times faster than that of the classic Q- learning algorithm. The proposed Q-learning optimization algorithm in the grid environment can successfully plan the shortest path to avoid obstacle areas in a short time.

2022 ◽  
Vol 14 (2) ◽  
pp. 932
Filip Vrbanić ◽  
Mladen Miletić ◽  
Leo Tišljarić ◽  
Edouard Ivanjko

Modern urban mobility needs new solutions to resolve high-complexity demands on urban traffic-control systems, including reducing congestion, fuel and energy consumption, and exhaust gas emissions. One example is urban motorways as key segments of the urban traffic network that do not achieve a satisfactory level of service to serve the increasing traffic demand. Another complex need arises by introducing the connected and autonomous vehicles (CAVs) and accompanying additional challenges that modern control systems must cope with. This study addresses the problem of decreasing the negative environmental aspects of traffic, which includes reducing congestion, fuel and energy consumption, and exhaust gas emissions. We applied a variable speed limit (VSL) based on Q-Learning that utilizes electric CAVs as speed-limit actuators in the control loop. The Q-Learning algorithm was combined with the two-step temporal difference target to increase the algorithm’s effectiveness for learning the VSL control policy for mixed traffic flows. We analyzed two different optimization criteria: total time spent on all vehicles in the traffic network and total energy consumption. Various mixed traffic flow scenarios were addressed with varying CAV penetration rates, and the obtained results were compared with a baseline no-control scenario and a rule-based VSL. The data about vehicle-emission class and the share of gasoline and diesel human-driven vehicles were taken from the actual data from the Croatian Bureau of Statistics. The obtained results show that Q-Learning-based VSL can learn the control policy and improve the macroscopic traffic parameters and total energy consumption and can reduce exhaust gas emissions for different electric CAV penetration rates. The results are most apparent in cases with low CAV penetration rates. Additionally, the results indicate that for the analyzed traffic demand, the increase in the CAV penetration rate alleviates the need to impose VSL control on an urban motorway.

Jili Tao ◽  
Ridong Zhang ◽  
Zhijun Qiao ◽  
Longhua Ma

A novel fuzzy energy management strategy (EMS) based on improved Q-learning controller and genetic algorithm (GA) is proposed for the real-time power split between fuel cell and supercapacitor of hybrid electric vehicle (HEV). Different from driving pattern recognition–based method, Q-Learning controller takes actions by observing the driving states and compensates to fuzzy controller, that is, no need to know the driving pattern in advance. Aimed to prolong the fuel cell lifetime and decrease its energy consumption, the initial values of Q-table are optimized by GA. Moreover, to enhance the environment adaptation capability, the learning strategy of Q-learning controller is improved. Two adaptive energy management strategies have been compared, and simulation results show that current fluctuation can be reduced by 6.9% and 41.5%, and H2 consumption can be saved by 0.35% and 6.08%, respectively. Meanwhile, state of charge (SOC) of supercapacitor is sustained within the desired safe range.

2022 ◽  
Vol 2022 ◽  
pp. 1-8
Le Chen ◽  
JeongYoung Song

This research aims to improve the rationality and intelligence of AUTOMATICALLY HIGHER MATHEMATICALLY EXAM SYSTEM (AHMES) through some AI algorithms. AHMES is an intelligent and high-quality higher math examination solution for the Department of Computer Engineering at Pai Chai University. This research redesigned the difficulty system of AHMES and used some AI algorithms for initialization and continuous adjustment. This paper describes the multiple linear regression algorithm involved in this research and the AHMES learning (AL) algorithm improved by the Q-learning algorithm. The simulation test results of the upgraded AHMES show the effectiveness of these algorithms.

Sign in / Sign up

Export Citation Format

Share Document