The control of two-wheeled self-balancing vehicle based on reinforcement learning in a continuous domain

Author(s):  
Penghui Xia ◽  
Yanjie Li
Author(s):  
Philip Odonkor ◽  
Kemper Lewis

This work leverages the current state of the art in reinforcement learning for continuous control, the Deep Deterministic Policy Gradient (DDPG) algorithm, towards the optimal 24-hour dispatch of shared energy assets within building clusters. The modeled DDPG agent interacts with a battery environment, designed to emulate a shared battery system. The aim here is to not only learn an efficient charged/discharged policy, but to also address the continuous domain question of how much energy should be charged or discharged. Experimentally, we examine the impact of the learned dispatch strategy towards minimizing demand peaks within the building cluster. Our results show that across the variety of building cluster combinations studied, the algorithm is able to learn and exploit energy arbitrage, tailoring it into battery dispatch strategies for peak demand shifting.


Author(s):  
Wenhui Zhang ◽  
Chenyu Wang ◽  
Wenjie Lin ◽  
Jiming Lin

Improved ant colony optimization (ACO) algorithms for continuous-domain optimization have been widely applied in recent years, but these improved methods have a weak perception of environmental information changes and only rely on the residues of the pheromones in the path to guide colony evolution. In this paper, we propose an ant colony algorithm based on the reinforcement learning model (RLACO). RLACO can acquire more environmental information by calculating the diversity of the ant colony, and, uses the diversity and other basic information of the ant colony to establish a reinforcement learning model. At different stages of evolution, the algorithm chooses an optimal strategy that can maximize the reward to improve the global search ability and convergence speed of the colony. The experimental results on CEC 2017 test functions show that the proposed algorithm is superior to other algorithms for continuous-domain optimization in convergence speed, accuracy and global search ability.


Decision ◽  
2016 ◽  
Vol 3 (2) ◽  
pp. 115-131 ◽  
Author(s):  
Helen Steingroever ◽  
Ruud Wetzels ◽  
Eric-Jan Wagenmakers

Sign in / Sign up

Export Citation Format

Share Document