Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.

Download Full-text

Genetic Q-Fuzzy Based Intelligent Control for Mobile Robot Navigation

Dynamic Systems and Control, Parts A and B ◽

10.1115/imece2004-60502 ◽

2004 ◽

Author(s):

V. Ram Mohan Parimi ◽

Devendra P. Garg

Keyword(s):

Genetic Algorithm ◽

Fuzzy Logic ◽

Reinforcement Learning ◽

Mobile Robot ◽

Fuzzy Logic Controller ◽

Learning Algorithm ◽

Robot Navigation ◽

Path Tracking ◽

Mobile Robot Navigation ◽

Reinforcement Learning Algorithm

This paper deals with the design and optimization of a Fuzzy Logic Controller that is used in the obstacle avoidance and path tracking problems of mobile robot navigation. The Fuzzy Logic controller is tuned using reinforcement learning controlled Genetic Algorithm. The operator probabilities of the Genetic Algorithm are adapted using reinforcement learning technique. The reinforcement learning algorithm used in this paper is Q-learning, a recently developed reinforcement learning algorithm. The performance of the Fuzzy-Logic Controller tuned with reinforcement controlled Genetic Algorithm is then compared with the one tuned with uncontrolled Genetic Algorithm. The theory is applied to a two-wheeled mobile robot’s path tracking problem. It is shown that the performance of the Fuzzy-Logic controller tuned by Genetic Algorithm controlled via reinforcement learning is better than the performance of the Fuzzy-Logic controller tuned via uncontrolled Genetic Algorithm.

Download Full-text

Developing Train Station Parking Algorithms: New Frameworks Based on Fuzzy Reinforcement Learning

Journal of Advanced Transportation ◽

10.1155/2019/3072495 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9

Author(s):

Wei Li ◽

Kai Xian ◽

Jiateng Yin ◽

Dewang Chen

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Urban Rail Transit ◽

Learning Ability ◽

Rail Transit ◽

Real World Data ◽

Train Station ◽

Q Learning ◽

Fuzzy Function ◽

Urban Rail

Train station parking (TSP) accuracy is important to enhance the efficiency of train operation and the safety of passengers for urban rail transit. However, TSP is always subject to a series of uncertain factors such as extreme weather and uncertain conditions of rail track resistances. To increase the parking accuracy, robustness, and self-learning ability, we propose new train station parking frameworks by using the reinforcement learning (RL) theory combined with the information of balises. Three algorithms were developed, involving a stochastic optimal selection algorithm (SOSA), a Q-learning algorithm (QLA), and a fuzzy function based Q-learning algorithm (FQLA) in order to reduce the parking error in urban rail transit. Meanwhile, five braking rates are adopted as the action vector of the three algorithms and some statistical indices are developed to evaluate parking errors. Simulation results based on real-world data show that the parking errors of the three algorithms are all within the ±30cm, which meet the requirement of urban rail transit.

Download Full-text

A reinforcement learning algorithm for neural networks with incremental learning ability

Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02. ◽

10.1109/iconip.2002.1201958 ◽

2002 ◽

Cited By ~ 8

Author(s):

N. Shiraga ◽

S. Ozawa ◽

S. Abe

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Incremental Learning ◽

Learning Algorithm ◽

Learning Ability ◽

Reinforcement Learning Algorithm

Download Full-text

Neural Networks with Online Sequential Learning Ability for a Reinforcement Learning Algorithm

Smart Innovation, Systems and Technologies - Advanced Computing, Networking and Informatics- Volume 1 ◽

10.1007/978-3-319-07353-8_11 ◽

2014 ◽

pp. 87-99

Author(s):

Hitesh Shah ◽

Madan Gopal

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Learning Ability ◽

Sequential Learning ◽

Online Sequential Learning ◽

Reinforcement Learning Algorithm

Download Full-text

A MULTI-AGENT APPROACH TO POMDPS USING OFF-POLICY REINFORCEMENT LEARNING AND GENETIC ALGORITHMS

International Journal of Computing ◽

10.47839/ijc.19.3.1887 ◽

2020 ◽

pp. 377-386

Author(s):

Samuel Obadan ◽

Zenghui Wang

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Feedforward Neural Networks ◽

Ground Truth ◽

Estimation Accuracy ◽

Offline Learning ◽

Markov Decision ◽

Multi Agent ◽

The Impact

This paper introduces novel concepts for accelerating learning in an off-policy reinforcement learning algorithm for Partially Observable Markov Decision Processes (POMDP) by leveraging multiple agents frame work. Reinforcement learning (RL) algorithm is considerably a slow but elegant approach to learning in an unknown environment. Although the action-value (Q-learning) is faster than the state-value, the rate of convergence to an optimal policy or maximum cumulative reward remains a constraint. Consequently, in an attempt to optimize the learning phase of an RL problem within POMD environment, we present two multi-agent learning paradigms: the multi-agent off-policy reinforcement learning and an ingenious GA (genetic Algorithm) approach for multi-agent offline learning using feedforward neural networks. At the end of the trainings (episodes and epochs) for reinforcement learning and genetic algorithm respectively, we compare the convergence rate for both algorithms with respect to creating the underlying MDPs for POMDP problems. Finally, we demonstrate the impact of layered resampling of Monte CarloвЂ™s particle filter for improving the belief state estimation accuracy with respect to ground truth within POMDP domains. Initial empirical results suggest practicable solutions.

Download Full-text

Online Self-Organizing Network Control with Time Averaged Weighted Throughput Objective

Discrete Dynamics in Nature and Society ◽

10.1155/2018/4184805 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11

Author(s):

Zhicong Zhang ◽

Shuai Li ◽

Xiaohui Yan

Keyword(s):

Reinforcement Learning ◽

Control Problem ◽

Decision Model ◽

Learning Algorithm ◽

Queueing Network ◽

Network Control ◽

Learning Ability ◽

Control Decision ◽

Self Organizing ◽

Reinforcement Learning Algorithm

We study an online multisource multisink queueing network control problem characterized with self-organizing network structure and self-organizing job routing. We decompose the self-organizing queueing network control problem into a series of interrelated Markov Decision Processes and construct a control decision model for them based on the coupled reinforcement learning (RL) architecture. To maximize the mean time averaged weighted throughput of the jobs through the network, we propose a reinforcement learning algorithm with time averaged reward to deal with the control decision model and obtain a control policy integrating the jobs routing selection strategy and the jobs sequencing strategy. Computational experiments verify the learning ability and the effectiveness of the proposed reinforcement learning algorithm applied in the investigated self-organizing network control problem.

Download Full-text

A Computational Method for Optimizing Experimental Environments forPhellinus igniariusvia Genetic Algorithm and BP Neural Network

BioMed Research International ◽

10.1155/2016/4374603 ◽

2016 ◽

Vol 2016 ◽

pp. 1-6

Author(s):

Zhongwei Li ◽

Beibei Sun ◽

Yuezhen Xin ◽

Xun Wang ◽

Hu Zhu

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Bp Neural Network ◽

Large Scale ◽

Culture Conditions ◽

Computational Method ◽

Learning Ability ◽

Intelligent Algorithm ◽

Hybrid Intelligent Algorithm ◽

Medicinal Value

Flavones, the secondary metabolites ofPhellinus igniariusfungus, have the properties of antioxidation and anticancer. Because of the great medicinal value, there are large demands on flavones for medical use and research. Flavones abstracted from naturalPhellinuscan not meet the medical and research need, sincePhellinusin the natural environment is very rare and is hard to be cultivated artificially. The production of flavones is mainly related to the fermentation culture ofPhellinus, which made the optimization of culture conditions an important problem. Some researches were made to optimize the fermentation culture conditions, such as the method of response surface methodology, which claimed the optimal flavones production was 1532.83 μg/mL. In order to further optimize the fermentation culture conditions for flavones, in this work a hybrid intelligent algorithm with genetic algorithm and BP neural network is proposed. Our method has the intelligent learning ability and can overcome the limitation of large-scale biotic experiments. Through simulations, the optimal culture conditions are obtained and the flavones production is increased to 2200 μg/mL.

Download Full-text

Reinforcement Learning Based Network Selection for Hybrid VLC and RF Systems

MATEC Web of Conferences ◽

10.1051/matecconf/201817303014 ◽

2018 ◽

Vol 173 ◽

pp. 03014 ◽

Cited By ~ 2

Author(s):

Chunxi Wang ◽

Guofeng Wu ◽

Zhiyong Du ◽

Bin jiang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Dynamic Environment ◽

Visible Light Communication ◽

Learning Ability ◽

Network Selection ◽

Service Requirement ◽

Rf Systems ◽

Network Load

For hybrid indoor network scenario with LTE, WLAN and Visible Light Communication (VLC), selecting network intelligently based on user service requirement is essential for ensuring high user quality of experience. In order to tackle the challenge due to dynamic environment and complicated service requirement, we propose a reinforcement learning solution for indoor network selection. In particular, a transfer learning based network selection algorithm, i.e., reinforcement learning with knowledge transfer, is proposed by revealing and exploiting the context information about the features of traffic, networks and network load distribution. The simulations show that the proposed algorithm has an efficient online learning ability and could achieve much better performance with faster convergence speed than the traditional reinforcement learning algorithm.

Download Full-text

AUV Obstacle Avoidance Planning Based on Deep Reinforcement Learning

Journal of Marine Science and Engineering ◽

10.3390/jmse9111166 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1166

Author(s):

Jianya Yuan ◽

Hongjian Wang ◽

Honghan Zhang ◽

Changjian Lin ◽

Dan Yu ◽

...

Keyword(s):

Genetic Algorithm ◽

Deep Learning ◽

Reinforcement Learning ◽

Real Time ◽

Collision Avoidance ◽

Autonomous Underwater Vehicle ◽

Learning Algorithm ◽

Avoidance Performance ◽

Network Algorithms ◽

Practical Applications

In a complex underwater environment, finding a viable, collision-free path for an autonomous underwater vehicle (AUV) is a challenging task. The purpose of this paper is to establish a safe, real-time, and robust method of collision avoidance that improves the autonomy of AUVs. We propose a method based on active sonar, which utilizes a deep reinforcement learning algorithm to learn the processed sonar information to navigate the AUV in an uncertain environment. We compare the performance of double deep Q-network algorithms with that of a genetic algorithm and deep learning. We propose a line-of-sight guidance method to mitigate abrupt changes in the yaw direction and smooth the heading changes when the AUV switches trajectory. The different experimental results show that the double deep Q-network algorithms ensure excellent collision avoidance performance. The effectiveness of the algorithm proposed in this paper was verified in three environments: random static, mixed static, and complex dynamic. The results show that the proposed algorithm has significant advantages over other algorithms in terms of success rate, collision avoidance performance, and generalization ability. The double deep Q-network algorithm proposed in this paper is superior to the genetic algorithm and deep learning in terms of the running time, total path, performance in avoiding collisions with moving obstacles, and planning time for each step. After the algorithm is trained in a simulated environment, it can still perform online learning according to the information of the environment after deployment and adjust the weight of the network in real-time. These results demonstrate that the proposed approach has significant potential for practical applications.

Download Full-text

Model dependent reinforcement learning algorithm for reservoir operation stochastic optimization

International Journal of Hydrology ◽

10.15406/ijh.2018.02.00129 ◽

2018 ◽

Vol 2 (5) ◽

Author(s):

Li Wenwu

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Reservoir Operation ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text