1P2-Q05 Analyses of Relationship between Energy Transfer in Robot Links and Motor Commands Using State-Action Value Function (New Control Theory and Motion Control (2))

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.

Download Full-text

A solution for the Elevators Group Dispatch by Multiagent Reinforcement Learning

10.5753/eniac.2019.9322 ◽

2019 ◽

Author(s):

Jordão Memória ◽

José Maia

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

The State ◽

Evaluation Function ◽

State Action ◽

Traffic Pattern ◽

Multiagent Reinforcement Learning ◽

Multi Agent ◽

Action Value

In this work, a modeling and algorithm based on multiagent reinforcement learning is developed for the problem of elevator group dispatch. The main advantage is that, along with the function approximation, this multi-agent solution leads to reduction of the state space, allowing complex states to be addressed with a synthesizing evaluation function. Each elevator is considered an agent that have to decide about two actions: answer or ignore the new call. With some iterations, the agents learn the weights of an evaluation function which approximate the state-action value function. The performance of solution (average waiting time - AWT), shown varying the traffic pattern, flow of people, number of elevators and number of floors, is comparable to other current proposals reported in the literature.

Download Full-text

Low-rank State-action Value-function Approximation

10.23919/eusipco54536.2021.9616008 ◽

2021 ◽

Author(s):

Sergio Rozada ◽

Victor Tenorio ◽

Antonio G. Marques

Keyword(s):

Function Approximation ◽

Value Function ◽

Low Rank ◽

Value Function Approximation ◽

State Action ◽

Action Value

Download Full-text

Value-Based Continuous Control Without Concrete State-Action Value Function

Lecture Notes in Computer Science - Advances in Swarm Intelligence ◽

10.1007/978-3-030-78811-7_34 ◽

2021 ◽

pp. 352-364

Author(s):

Jin Zhu ◽

Haixian Zhang ◽

Zhen Pan

Keyword(s):

Value Function ◽

Continuous Control ◽

State Action ◽

Action Value

Download Full-text

Energy Management of Hybrid UAV Based on Reinforcement Learning

Electronics ◽

10.3390/electronics10161929 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1929

Author(s):

Huan Shen ◽

Yao Zhang ◽

Jianguo Mao ◽

Zhiwei Yan ◽

Linwei Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Energy Management ◽

Internal Combustion Engines ◽

Value Function ◽

Learning Algorithm ◽

The State ◽

Combustion Engines ◽

State Action ◽

Action Value

In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.

Download Full-text

Planar Motion Control, Coordination and Dynamic Entrainment in Chaplygin Beanies

Volume 3: Modeling and Validation; Multi-Agent and Networked Systems; Path Planning and Motion Control; Tracking Control Systems; Unmanned Aerial Vehicles (UAVs) and Application; Unmanned Ground and Aerial Vehicles; Vibration in Mechanical Systems; Vibrations and Control of Systems; Vibrations: Modeling, Analysis, and Control ◽

10.1115/dscc2018-9037 ◽

2018 ◽

Author(s):

Scott Kelly ◽

Rodrigo Abrajan-Guerrero ◽

Jaskaran Grover ◽

Matthew Travers ◽

Howie Choset

Keyword(s):

Energy Transfer ◽

Motion Control ◽

Nonholonomic Constraint ◽

The Other ◽

Planar Motion ◽

Single Input ◽

Robotic Vehicle

The Chaplygin beanie is a single-input robotic vehicle for which partial planar motion control can be achieved by exploiting a simple nonholonomic constraint. A previous paper suggested a strategy for such motion control. In the present paper, this strategy is validated experimentally and extended to the context of multi-vehicle coordination. It is then shown that when the plane on which two such vehicles operate is translationally compliant, energy transfer between the two can enable a mechanism whereby one (operating under control) may entrain the other (operating passively), partly coordinating their motion. As an extension to this result, it is further demonstrated that a pair of passive vehicles operating on a translationally compliant platform can eventually attain the same heading when released from their deformed configurations.

Download Full-text