Experimental Study on Behavior Acquisition of Mobile Robot by Deep Q-Network

Author(s):  
Hikaru Sasaki ◽  
Tadashi Horiuchi ◽  
Satoru Kato ◽  
◽  
◽  
...  

Deep Q-network (DQN) is one of the most famous methods of deep reinforcement learning. DQN approximates the action-value function using Convolutional Neural Network (CNN) and updates it using Q-learning. In this study, we applied DQN to robot behavior learning in a simulation environment. We constructed the simulation environment for a two-wheeled mobile robot using the robot simulation software, Webots. The mobile robot acquired good behavior such as avoiding walls and moving along a center line by learning from high-dimensional visual information supplied as input data. We propose a method that reuses the best target network so far when the learning performance suddenly falls. Moreover, we incorporate Profit Sharing method into DQN in order to accelerate learning. Through the simulation experiment, we confirmed that our method is effective.

2015 ◽  
Vol 789-790 ◽  
pp. 717-722
Author(s):  
Ebrahim Mattar ◽  
K. Al Mutib ◽  
M. AlSulaiman ◽  
Hedjar Ramdane

It is essential to learn a robot navigation environment. We describe research outcomes for KSU-IMR mapping and intelligence. This is for navigating and robot behavior learning. The mobile maps learning and intelligence was based on hybrid paradigms and AI functionaries. Intelligence was based on ANN-PCA for dimensionality reduction, and Neuro-Fuzzy architecture.


2012 ◽  
Vol 51 (9) ◽  
pp. 40-46 ◽  
Author(s):  
Pradipta KDas ◽  
S. C. Mandhata ◽  
H. S. Behera ◽  
S. N. Patro

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Yan Li ◽  
Lijie Yu ◽  
Siran Tao ◽  
Kuanmin Chen

For the purpose of improving the efficiency of traffic signal control for isolate intersection under oversaturated conditions, a multi-objective optimization algorithm for traffic signal control is proposed. Throughput maximum and average queue ratio minimum are selected as the optimization objectives of the traffic signal control under oversaturated condition. A simulation environment using VISSIM SCAPI was utilized to evaluate the convergence and the optimization results under various settings and traffic conditions. It is written by C++/CRL to connect the simulation software VISSIM and the proposed algorithm. The simulation results indicated that the signal timing plan generated by the proposed algorithm has good efficiency in managing the traffic flow at oversaturated intersection than the commonly utilized signal timing optimization software Synchro. The update frequency applied in the simulation environment was 120 s, and it can meet the requirements of signal timing plan update in real filed. Thus, the proposed algorithm has the capability of searching Pareto front of the multi-objective problem domain under both normal condition and over-saturated condition.


Author(s):  
Huashuai Zhang ◽  
Tingmei Wang ◽  
Haiwei Shen

The resource optimization of ultra-dense networks (UDNs) is critical to meet the huge demand of users for wireless data traffic. But the mainstream optimization algorithms have many problems, such as the poor optimization effect, and high computing load. This paper puts forward a wireless resource allocation algorithm based on deep reinforcement learning (DRL), which aims to maximize the total throughput of the entire network and transform the resource allocation problem into a deep Q-learning process. To effectively allocate resources in UDNs, the DRL algorithm was introduced to improve the allocation efficiency of wireless resources; the authors adopted the resource allocation strategy of the deep Q-network (DQN), and employed empirical repetition and target network to overcome the instability and divergence of the results caused by the previous network state, and to solve the overestimation of the Q value. Simulation results show that the proposed algorithm can maximize the total throughput of the network, while making the network more energy-efficient and stable. Thus, it is very meaningful to introduce the DRL to the research of UDN resource allocation.


2016 ◽  
Vol 16 (4) ◽  
pp. 113-125
Author(s):  
Jianxian Cai ◽  
Xiaogang Ruan ◽  
Pengxuan Li

Abstract An autonomous path-planning strategy based on Skinner operant conditioning principle and reinforcement learning principle is developed in this paper. The core strategies are the use of tendency cell and cognitive learning cell, which simulate bionic orientation and asymptotic learning ability. Cognitive learning cell is designed on the base of Boltzmann machine and improved Q-Learning algorithm, which executes operant action learning function to approximate the operative part of robot system. The tendency cell adjusts network weights by the use of information entropy to evaluate the function of operate action. The results of the simulation experiment in mobile robot showed that the designed autonomous path-planning strategy lets the robot realize autonomous navigation path planning. The robot learns to select autonomously according to the bionic orientate action and have fast convergence rate and higher adaptability.


Sign in / Sign up

Export Citation Format

Share Document