A solution for the Elevators Group Dispatch by Multiagent Reinforcement Learning

Mapping Intimacies ◽

10.5753/eniac.2019.9322 ◽

2019 ◽

Author(s):

Jordão Memória ◽

José Maia

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

The State ◽

Evaluation Function ◽

State Action ◽

Traffic Pattern ◽

Multiagent Reinforcement Learning ◽

Multi Agent ◽

Action Value

In this work, a modeling and algorithm based on multiagent reinforcement learning is developed for the problem of elevator group dispatch. The main advantage is that, along with the function approximation, this multi-agent solution leads to reduction of the state space, allowing complex states to be addressed with a synthesizing evaluation function. Each elevator is considered an agent that have to decide about two actions: answer or ignore the new call. With some iterations, the agents learn the weights of an evaluation function which approximate the state-action value function. The performance of solution (average waiting time - AWT), shown varying the traffic pattern, flow of people, number of elevators and number of floors, is comparable to other current proposals reported in the literature.

Download Full-text

Energy Management of Hybrid UAV Based on Reinforcement Learning

Electronics ◽

10.3390/electronics10161929 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1929

Author(s):

Huan Shen ◽

Yao Zhang ◽

Jianguo Mao ◽

Zhiwei Yan ◽

Linwei Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Energy Management ◽

Internal Combustion Engines ◽

Value Function ◽

Learning Algorithm ◽

The State ◽

Combustion Engines ◽

State Action ◽

Action Value

In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.

Download Full-text

Reinforcement Learning for Optimizing Driving Policies on Cruising Taxis Services

Sustainability ◽

10.3390/su12218883 ◽

2020 ◽

Vol 12 (21) ◽

pp. 8883

Author(s):

Kun Jin ◽

Wei Wang ◽

Xuedong Hua ◽

Wei Zhou

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

State Action ◽

Future Reward ◽

Long Run ◽

Markov Decision ◽

Action Value ◽

Data Expansion ◽

Taking Action ◽

The Value Function

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.

Download Full-text

Determinantal Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014659 ◽

2019 ◽

Vol 33 ◽

pp. 4659-4666

Author(s):

Takayuki Osogami ◽

Rudy Raymond

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Positive Semidefinite ◽

Positive Semidefinite Matrix ◽

The Matrix ◽

Multi Agent ◽

Action Value ◽

The Individual ◽

Partially Observable ◽

Semidefinite Matrix

We study reinforcement learning for controlling multiple agents in a collaborative manner. In some of those tasks, it is insufficient for the individual agents to take relevant actions, but those actions should also have diversity. We propose the approach of using the determinant of a positive semidefinite matrix to approximate the action-value function in reinforcement learning, where we learn the matrix in a way that it represents the relevance and diversity of the actions. Experimental results show that the proposed approach allows the agents to learn a nearly optimal policy approximately ten times faster than baseline approaches in benchmark tasks of multi-agent reinforcement learning. The proposed approach is also shown to achieve the performance that cannot be achieved with conventional approaches in partially observable environment with exponentially large action space.

Download Full-text

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/492 ◽

2021 ◽

Author(s):

Hua Wei ◽

Deheng Ye ◽

Zhao Liu ◽

Hao Wu ◽

Bo Yuan ◽

...

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Approximation Error ◽

The State ◽

Training Data ◽

Action Function ◽

Q Learning ◽

State Action ◽

Generative Modeling ◽

Benchmark Datasets

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game, Honor of Kings.

Download Full-text

Low-rank State-action Value-function Approximation

10.23919/eusipco54536.2021.9616008 ◽

2021 ◽

Author(s):

Sergio Rozada ◽

Victor Tenorio ◽

Antonio G. Marques

Keyword(s):

Function Approximation ◽

Value Function ◽

Low Rank ◽

Value Function Approximation ◽

State Action ◽

Action Value

Download Full-text

Multi-Agent/Robot Deep Reinforcement Learning with Macro-Actions (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7255 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13965-13966

Author(s):

Yuchen Xiao ◽

Joshua Hoffman ◽

Tian Xia ◽

Christopher Amato

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Action Observation ◽

Action Selection ◽

New Approaches ◽

Multi Agent ◽

Action Value

We consider the challenges of learning multi-agent/robot macro-action-based deep Q-nets including how to properly update each macro-action value and accurately maintain macro-action-observation trajectories. We address these challenges by first proposing two fundamental frameworks for learning macro-action-value function and joint macro-action-value function. Furthermore, we present two new approaches of learning decentralized macro-action-based policies, which involve a new double Q-update rule that facilitates the learning of decentralized Q-nets by using a centralized Q-net for action selection. Our approaches are evaluated both in simulation and on real robots.

Download Full-text

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

IEEE Transactions on Control of Network Systems ◽

10.1109/tcns.2021.3061909 ◽

2021 ◽

pp. 1-1

Author(s):

Milos S. Stankovic ◽

Marko Beko ◽

Srdjan S. Stankovic

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

Value Function Approximation ◽

Multi Agent

Download Full-text

Reinforcement Learning for Control Using Value Function Approximation

Encyclopedia of Systems and Control ◽

10.1007/978-3-030-44184-5_100067 ◽

2021 ◽

pp. 1868-1873

Author(s):

Konstantinos Gatsis ◽

George J. Pappas

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

Value Function Approximation

Download Full-text

Function approximation based multi-agent reinforcement learning

Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000 ◽

10.1109/tai.2000.889843 ◽

2002 ◽

Author(s):

O. Abul ◽

F. Polat ◽

R. Alhajj

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Multi Agent

Download Full-text

Online Tuning of a PID Controller with a Fuzzy Reinforcement Learning MAS for Flow Rate Control of a Desalination Unit

Electronics ◽

10.3390/electronics8020231 ◽

2019 ◽

Vol 8 (2) ◽

pp. 231 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Reinforcement Learning ◽

Flow Rate ◽

Pid Controller ◽

Hybrid Control ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Flow Rate Control ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) fuzzy reinforcement learning MAS (Multi-Agent System) approach for online tuning of a Proportional Integral Derivative (PID) controller in order to control the flow rate of a desalination unit. The PID gains are set by the Z-N method and then are adapted online through the fuzzy Q-learning MAS. The fuzzy Q-learning is introduced in each agent in order to confront with the continuous state-action space. The global state of the MAS is defined by the value of the error and the derivative of error. The MAS consists of three agents and the output signal of each agent defines the percentage change of each gain. The increment or the reduction of each gain can be in the range of 0% to 100% of its initial value. The simulation results highlight the performance of the suggested hybrid control strategy through comparison with the conventional PID controller tuned by Z-N.

Download Full-text