Comparing the Explainability and Performance of Reinforcement Learning and Genetic Fuzzy Systems for Safe Satellite Docking

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.

Download Full-text

Video Game Design and Performance Analysis Based on Reinforcement Learning

2019 International Joint Conference on Information, Media and Engineering (IJCIME) ◽

10.1109/ijcime49369.2019.00053 ◽

2019 ◽

Author(s):

Ling Lei ◽

Mingliang Han ◽

Ying Sun ◽

Hongtao Yang

Keyword(s):

Reinforcement Learning ◽

Performance Analysis ◽

Video Game ◽

Game Design ◽

Video Game Design ◽

And Performance

Download Full-text

Hierarchical Neuro-Fuzzy Systems Part II

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch121 ◽

2011 ◽

pp. 817-824

Author(s):

Marley Vellasco ◽

Marco Pacheco ◽

Karla Figueiredo ◽

Flavio Souza

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Process ◽

Fuzzy Systems ◽

Space Partitioning ◽

Learning Methods ◽

New Class ◽

Neuro Fuzzy ◽

Binary Space Partitioning ◽

Priori Information

This paper describes a new class of neuro-fuzzy models, called Reinforcement Learning Hierarchical Neuro- Fuzzy Systems (RL-HNF). These models employ the BSP (Binary Space Partitioning) and Politree partitioning of the input space [Chrysanthou,1992] and have been developed in order to bypass traditional drawbacks of neuro-fuzzy systems: the reduced number of allowed inputs and the poor capacity to create their own structure and rules (ANFIS [Jang,1997], NEFCLASS [Kruse,1995] and FSOM [Vuorimaa,1994]). These new models, named Reinforcement Learning Hierarchical Neuro-Fuzzy BSP (RL-HNFB) and Reinforcement Learning Hierarchical Neuro-Fuzzy Politree (RL-HNFP), descend from the original HNFB that uses Binary Space Partitioning (see Hierarchical Neuro-Fuzzy Systems Part I). By using hierarchical partitioning, together with the Reinforcement Learning (RL) methodology, a new class of Neuro-Fuzzy Systems (SNF) was obtained, which executes, in addition to automatically learning its structure, the autonomous learning of the actions to be taken by an agent, dismissing a priori information (number of rules, fuzzy rules and sets) relative to the learning process. These characteristics represent an important differential when compared with existing intelligent agents learning systems, because in applications involving continuous environments and/or environments considered to be highly dimensional, the use of traditional Reinforcement Learning methods based on lookup tables (a table that stores value functions for a small or discrete state space) is no longer possible, since the state space becomes too large. This second part of hierarchical neuro-fuzzy systems focus on the use of reinforcement learning process. The first part presented HNFB models based on supervised learning methods. The RL-HNFB and RL-HNFP models were evaluated in a benchmark control application and a simulated Khepera robot environment with multiple obstacles.

Download Full-text