Comparison of learning performance of character controller based on
                        deep reinforcement learning according to state representation

Chaejun Sohn; Taesoo Kwon; Yoonsang Lee

doi:10.15701/kcgs.2021.27.5.55

Decoupling State Representation Methods from Reinforcement Learning in Car Racing

Proceedings of the 13th International Conference on Agents and Artificial Intelligence ◽

10.5220/0010237507520759 ◽

2021 ◽

Author(s):

Juan Montoya ◽

Imant Daunhawer ◽

Julia Vogt ◽

Marco Wiering

Keyword(s):

Reinforcement Learning ◽

State Representation ◽

Car Racing

Download Full-text

Emergence of Discrete and Abstract State Representation through Reinforcement Learning in a Continuous Input Task

Advances in Intelligent Systems and Computing - Robot Intelligence Technology and Applications 2012 ◽

10.1007/978-3-642-37374-9_2 ◽

2013 ◽

pp. 13-21 ◽

Cited By ~ 2

Author(s):

Yoshito Sawatsubashi ◽

Mohamad Faizal bin Samusudin ◽

Katsunari Shibata

Keyword(s):

Reinforcement Learning ◽

State Representation ◽

Continuous Input

Download Full-text

Consideration of State Representation for Semi-autonomous Reinforcement Learning of Sailing Within a Navigable Area

Robotic Sailing 2015 ◽

10.1007/978-3-319-23335-2_7 ◽

2015 ◽

pp. 89-102

Author(s):

Hideaki Manabe ◽

Kanta Tachibana

Keyword(s):

Reinforcement Learning ◽

State Representation

Download Full-text

State Representation Learning For Effective Deep Reinforcement Learning

2020 IEEE International Conference on Multimedia and Expo (ICME) ◽

10.1109/icme46284.2020.9102924 ◽

2020 ◽

Author(s):

Jian Zhao ◽

Wengang Zhou ◽

Tianyu Zhao ◽

Yun Zhou ◽

Houqiang Li

Keyword(s):

Reinforcement Learning ◽

Representation Learning ◽

State Representation

Download Full-text

Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning

Sensors ◽

10.3390/s19071576 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1576 ◽

Cited By ~ 1

Author(s):

Xiaomao Zhou ◽

Tao Bai ◽

Yanbin Gao ◽

Yuntao Han

Keyword(s):

Reinforcement Learning ◽

Unsupervised Learning ◽

Learning Algorithm ◽

Spatial Scales ◽

Learning Performance ◽

Head Direction ◽

Topological Map ◽

Topological Maps ◽

Hierarchical Reinforcement Learning ◽

Continuous State

Extensive studies have shown that many animals’ capability of forming spatial representations for self-localization, path planning, and navigation relies on the functionalities of place and head-direction (HD) cells in the hippocampus. Although there are numerous hippocampal modeling approaches, only a few span the wide functionalities ranging from processing raw sensory signals to planning and action generation. This paper presents a vision-based navigation system that involves generating place and HD cells through learning from visual images, building topological maps based on learned cell representations and performing navigation using hierarchical reinforcement learning. First, place and HD cells are trained from sequences of visual stimuli in an unsupervised learning fashion. A modified Slow Feature Analysis (SFA) algorithm is proposed to learn different cell types in an intentional way by restricting their learning to separate phases of the spatial exploration. Then, to extract the encoded metric information from these unsupervised learning representations, a self-organized learning algorithm is adopted to learn over the emerged cell activities and to generate topological maps that reveal the topology of the environment and information about a robot’s head direction, respectively. This enables the robot to perform self-localization and orientation detection based on the generated maps. Finally, goal-directed navigation is performed using reinforcement learning in continuous state spaces which are represented by the population activities of place cells. In particular, considering that the topological map provides a natural hierarchical representation of the environment, hierarchical reinforcement learning (HRL) is used to exploit this hierarchy to accelerate learning. The HRL works on different spatial scales, where a high-level policy learns to select subgoals and a low-level policy learns over primitive actions to specialize on the selected subgoals. Experimental results demonstrate that our system is able to navigate a robot to the desired position effectively, and the HRL shows a much better learning performance than the standard RL in solving our navigation tasks.

Download Full-text

Sharing Experience for Behavior Generation of Real Swarm Robot Systems Using Deep Reinforcement Learning

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2019.p0520 ◽

2019 ◽

Vol 31 (4) ◽

pp. 520-525 ◽

Cited By ~ 1

Author(s):

Toshiyuki Yasuda ◽

Kazuhiro Ohkura ◽

◽

Keyword(s):

Reinforcement Learning ◽

Collective Behavior ◽

Design Methodology ◽

Learning Performance ◽

Centralized Control ◽

Robot System ◽

Swarm Robots ◽

Robot Systems ◽

Swarm Robot ◽

Typical Design

Swarm robotic systems (SRSs) are a type of multi-robot system in which robots operate without any form of centralized control. The typical design methodology for SRSs comprises a behavior-based approach, where the desired collective behavior is obtained manually by designing the behavior of individual robots in advance. In contrast, in an automatic design approach, a certain general methodology is adopted. This paper presents a deep reinforcement learning approach for collective behavior acquisition of SRSs. The swarm robots are expected to collect information in parallel and share their experience for accelerating their learning. We conducted real swarm robot experiments and evaluated the learning performance of the swarm in a scenario where the robots consecutively traveled between two landmarks.

Download Full-text

Potential-Based Shaping and Q-Value Initialization are Equivalent

Journal of Artificial Intelligence Research ◽

10.1613/jair.1190 ◽

2003 ◽

Vol 19 ◽

pp. 205-208 ◽

Cited By ~ 39

Author(s):

E. Wiewiora

Keyword(s):

Reinforcement Learning ◽

Potential Function ◽

Learning Algorithms ◽

Learning Performance ◽

Q Value ◽

Broad Category ◽

Optimal Behavior ◽

Q Values ◽

Simpler Method

Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for several reinforcement learning algorithms. More specifically, we prove that a reinforcement learner with initial Q-values based on the shaping algorithm's potential function make the same updates throughout learning as a learner receiving potential-based shaping rewards. We further prove that under a broad category of policies, the behavior of these two learners are indistinguishable. The comparison provides intuition on the theoretical properties of the shaping algorithm as well as a suggestion for a simpler method for capturing the algorithm's benefit. In addition, the equivalence raises previously unaddressed issues concerning the efficiency of learning with potential-based shaping.

Download Full-text

Analysis of the possibilities for using machine learning algorithms in the Unity environment

Journal of Computer Sciences Institute ◽

10.35784/jcsi.2680 ◽

2021 ◽

Vol 20 ◽

pp. 197-204

Author(s):

Karina Litwynenko ◽

Małgorzata Plechawska-Wójcik

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning Performance ◽

Imitation Learning ◽

Policy Optimization

Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.

Download Full-text

Increased Reinforcement Learning Performance through Transfer of Representation Learned by State Prediction Model

10.1109/ijcnn52387.2021.9533751 ◽

2021 ◽

Author(s):

Alperen Tercan ◽

Charles W. Anderson

Keyword(s):

Reinforcement Learning ◽

Prediction Model ◽

Learning Performance ◽

State Prediction

Download Full-text

Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping by State Representation Learning Based on a Preprocessed Input Image

10.1109/iros51168.2021.9635931 ◽

2021 ◽

Author(s):

Taewon Kim ◽

Yeseong Park ◽

Youngbin Park ◽

Sang Hyoung Lee ◽

Il Hong Suh

Keyword(s):

Reinforcement Learning ◽

Representation Learning ◽

Input Image ◽

State Representation

Download Full-text

Comparison of learning performance of character controller based on deep reinforcement learning according to state representation

Decoupling State Representation Methods from Reinforcement Learning in Car Racing

Emergence of Discrete and Abstract State Representation through Reinforcement Learning in a Continuous Input Task

Consideration of State Representation for Semi-autonomous Reinforcement Learning of Sailing Within a Navigable Area

State Representation Learning For Effective Deep Reinforcement Learning

Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning

Sharing Experience for Behavior Generation of Real Swarm Robot Systems Using Deep Reinforcement Learning

Potential-Based Shaping and Q-Value Initialization are Equivalent

Analysis of the possibilities for using machine learning algorithms in the Unity environment

Increased Reinforcement Learning Performance through Transfer of Representation Learned by State Prediction Model

Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping by State Representation Learning Based on a Preprocessed Input Image