Q-learning algorithm based multi-agent coordinated control method for microgrids

Multi-Agent Systems (MAS) have been used to solve several optimization problems in control systems. MAS allow understanding the interactions between agents and the complexity of the system, thus generating functional models that are closer to reality. However, these approaches assume that information between agents is always available, which means the employment of a full-information model. Some tendencies have been growing in importance to tackle scenarios where information constraints are relevant issues. In this sense, game theory approaches appear as a useful technique that use a strategy concept to analyze the interactions of the agents and achieve the maximization of agent outcomes. In this paper, we propose a distributed control method of learning that allows analyzing the effect of the exploration concept in MAS. The dynamics obtained use Q-learning from reinforcement learning as a way to include the concept of exploration into the classic exploration-less Replicator Dynamics equation. Then, the Boltzmann distribution is used to introduce the Boltzmann-Based Distributed Replicator Dynamics as a tool for controlling agents behaviors. This distributed approach can be used in several engineering applications, where communications constraints between agents are considered. The behavior of the proposed method is analyzed using a smart grid application for validation purposes. Results show that despite the lack of full information of the system, by controlling some parameters of the method, it has similar behavior to the traditional centralized approaches.

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text

Research of Q-learning Algorithm of Multi-agent in Micro-grid Control System Based on Probability

Proceedings of the 2016 5th International Conference on Energy and Environmental Protection (ICEEP 2016) ◽

10.2991/iceep-16.2016.54 ◽

2016 ◽

Author(s):

Xiangna Li ◽

Nan Yi ◽

Mengao Li ◽

Weifeng Xu

Keyword(s):

Control System ◽

Learning Algorithm ◽

Q Learning ◽

Multi Agent ◽

Micro Grid

Download Full-text

Application of Q Learning-Based Self-Tuning PID with DRNN in the Strip Flatness and Gauge System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.494-495.1377 ◽

2014 ◽

Vol 494-495 ◽

pp. 1377-1380

Author(s):

Yu Lian Jiang ◽

Jian Chang Liu ◽

Shu Bin Tan

Keyword(s):

Control Method ◽

Output Control ◽

Control Laws ◽

Q Learning ◽

Learning Agent ◽

Automatic Gauge Control ◽

Agent Simulation ◽

Flatness Control ◽

Self Tuning ◽

Multi Agent

In view of the process of automatic flatness control and automatic gauge control that is a nonlinear system with multi-dimensions, multi-variables, strong coupling and time variation, a novel control method called self-tuning PID with diagonal recurrent neural network (DRNN-PID) based on Q learning is proposed. It is able to coordinate the coupling of flatness control and gauge control agents to get the satisfactory control requirements without decoupling directly and amend output control laws by DRNN-PID adaptively. Decomposition-coordination is utilized to establish a novel multi-agent system for coordination control including flatness agent, gauge agent and Q learning agent. Simulation result demonstrates the validity of our proposed method.

Download Full-text

Multi-agent cooperation Q-learning algorithm based on constrained Markov Game

Computer Science and Information Systems ◽

10.2298/csis191220009g ◽

2020 ◽

Vol 17 (2) ◽

pp. 647-664

Author(s):

Yangyang Ge ◽

Fei Zhu ◽

Wei Huang ◽

Peiyao Zhao ◽

Quan Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent System ◽

Agent System ◽

Action Function ◽

Q Learning ◽

State Action ◽

Markov Game ◽

Safety Constraints ◽

Multi Agent

Multi-Agent system has broad application in real world, whose security performance, however, is barely considered. Reinforcement learning is one of the most important methods to resolve Multi-Agent problems. At present, certain progress has been made in applying Multi-Agent reinforcement learning to robot system, man-machine match, and automatic, etc. However, in the above area, an agent may fall into unsafe states where the agent may find it difficult to bypass obstacles, to receive information from other agents and so on. Ensuring the safety of Multi-Agent system is of great importance in the above areas where an agent may fall into dangerous states that are irreversible, causing great damage. To solve the safety problem, in this paper we introduce a Multi-Agent Cooperation Q-Learning Algorithm based on Constrained Markov Game. In this method, safety constraints are added to the set of actions, and each agent, when interacting with the environment to search for optimal values, should be restricted by the safety rules, so as to obtain an optimal policy that satisfies the security requirements. Since traditional Multi-Agent reinforcement learning algorithm is no more suitable for the proposed model in this paper, a new solution is introduced for calculating the global optimum state-action function that satisfies the safety constraints. We take advantage of the Lagrange multiplier method to determine the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, under conditions that the state-action function and the constraint function are both differentiable, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.

Download Full-text

Multiagent reinforcement learning using Non-Parametric Approximation

Respuestas ◽

10.22463/0122820x.1738 ◽

2018 ◽

Vol 23 (2) ◽

pp. 53-61

Author(s):

David Luviano Cruz ◽

Francesco José García Luna ◽

Luis Asunción Pérez Domínguez

Keyword(s):

Reinforcement Learning ◽

Hybrid Control ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Generation Task ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Optimal Set ◽

Parametric Approximation

This paper presents a hybrid control proposal for multi-agent systems, where the advantages of the reinforcement learning and nonparametric functions are exploited. A modified version of the Q-learning algorithm is used which will provide data training for a Kernel, this approach will provide a sub optimal set of actions to be used by the agents. The proposed algorithm is experimentally tested in a path generation task in an unknown environment for mobile robots.

Download Full-text

Research on Control Method of Expressway Off-Ramp Based on Q-Learning Algorithm and Extension Control

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.6033 ◽

2012 ◽

Vol 433-440 ◽

pp. 6033-6037

Author(s):

Xiao Ming Liu ◽

Xiu Ying Wang

Keyword(s):

Queue Length ◽

Control Method ◽

Learning Algorithm ◽

Control Strategies ◽

Traffic Light ◽

Q Learning ◽

Reward Function ◽

Movement Characteristics ◽

Queue Lengths ◽

Simulation Results

The movement characteristics of traffic flow nearby have the important influence on the main line. The control method of expressway off-ramp based on Q-learning and extension control is established by analyzing parameters of off-ramp and auxiliary road. First, the basic description of Q-learning algorithm and extension control is given and analyzed necessarily. Then reward function is gained through the extension control theory to judge the state of traffic light. Simulation results show that compared to the queue lengths of off-ramp and auxiliary road, control method based on Q-learning algorithm and extension control greatly reduced queue length of off-ramp, which demonstrates the feasibility of control strategies.

Download Full-text

Jamming-Resilient Wideband Cognitive Radios with Multi-Agent Reinforcement Learning

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2018070101 ◽

2018 ◽

Vol 10 (3) ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

Mohamed A. Aref ◽

Sudharman K. Jayaweera

Keyword(s):

Learning Algorithm ◽

Cognitive Radios ◽

System Model ◽

Interference Avoidance ◽

Q Learning ◽

Selection Policy ◽

Cognitive Framework ◽

Multi Agent ◽

Simulation Results ◽

The Impact

This article presents a design of a wideband autonomous cognitive radio (WACR) for anti-jamming and interference-avoidance. The proposed system model allows multiple WACRs to simultaneously operate over the same spectrum range producing a multi-agent environment. The objective of each radio is to predict and evade a dynamic jammer signal as well as avoiding transmissions of other WACRs. The proposed cognitive framework is made of two operations: sensing and transmission. Each operation is helped by its own learning algorithm based on Q-learning, but both will be experiencing the same RF environment. The simulation results indicate that the proposed cognitive anti-jamming technique has low computational complexity and significantly outperforms non-cognitive sub-band selection policy while being sufficiently robust against the impact of sensing errors.

Download Full-text

An Entanglement-Inspired Action Selection and Knowledge Sharing Scheme for Cooperative Multi-agent Q-Learning Algorithm used in Robot Navigation

2020 10th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke50421.2020.9303636 ◽

2020 ◽

Author(s):

Mohammad Hasan Karami ◽

Hossein Aghababa ◽

Amir Hosein Keyhanipour

Keyword(s):

Knowledge Sharing ◽

Learning Algorithm ◽

Robot Navigation ◽

Action Selection ◽

Q Learning ◽

Sharing Scheme ◽

Multi Agent

Download Full-text

A Research on Aero-engine Control Based on Deep Q Learning

International Journal of Turbo and Jet Engines ◽

10.1515/tjj-2020-0009 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Qiangang Zheng ◽

Zhihua Xi ◽

Chunping Hu ◽

Haibo ZHANG ◽

Zhongzhi Hu

Keyword(s):

Value Function ◽

Control Method ◽

Learning Algorithm ◽

Training Data ◽

Engine Control ◽

Q Learning ◽

Model Free ◽

Deep Learning Algorithm ◽

Aero Engine ◽

Action Value

AbstractFor improving the response performance of engine, a novel aero-engine control method based on Deep Q Learning (DQL) is proposed. The engine controller based on DQL has been designed. The model free algorithm – Q learning, which can be performed online, is adopted to calculate the action value function. To improve the learning capacity of DQL, the deep learning algorithm – On Line Sliding Window Deep Neural Network (OL-SW-DNN), is adopted to estimate the action value function. For reducing the sensitivity to the noise of training data, OL-SW-DNN selects nearest point data of certain length as training data. Finally, the engine acceleration simulations of DQR and the Proportion Integration Differentiation (PID) which is the most commonly used as engine controller algorithm in industry are both conducted to verify the validity of the proposed method. The results show that the acceleration time of the proposed method decreased by 1.475 second while satisfied all of engine limits compared with the tradition controller.

Download Full-text