Reinforcement learning for motion control of humanoid robots

Author(s):  
S. Iida ◽  
M. Kanoh ◽  
S. Kato ◽  
H. Itoh
Robotica ◽  
2005 ◽  
Vol 24 (2) ◽  
pp. 257-268 ◽  
Author(s):  
Hun-ok Lim ◽  
Sang-ho Hyon ◽  
Samuel A. Setiawan ◽  
Atsuo Takanishi

Our goal is to develop biped humanoid robots capable of working stably in a human living and working space, with a focus on their physical construction and motion control. At the first stage, we have developed a human-like biped robot, WABIAN (WAseda BIped humANoid), which has a thirty-five mechanical degrees of freedom. Its height is 1.66 [m] and its weight 107.4 [kg]. In this paper, a moment compensation method is described for stability, which is based on the motion of its head, legs and arms. Also, a follow walking method is proposed which is based on a pattern switching technique. By a combination of both methods, the biped robot is able to perform dynamic stamping, walking forward and backward in a continuous time while someone is pushing or pulling its hand in such a way. Using WABIAN, human-fellow walking experiments are conducted, and the effectiveness of the methods are verified.


2021 ◽  
Vol 33 (1) ◽  
pp. 129-156
Author(s):  
Masami Iwamoto ◽  
Daichi Kato

This letter proposes a new idea to improve learning efficiency in reinforcement learning (RL) with the actor-critic method used as a muscle controller for posture stabilization of the human arm. Actor-critic RL (ACRL) is used for simulations to realize posture controls in humans or robots using muscle tension control. However, it requires very high computational costs to acquire a better muscle control policy for desirable postures. For efficient ACRL, we focused on embodiment that is supposed to potentially achieve efficient controls in research fields of artificial intelligence or robotics. According to the neurophysiology of motion control obtained from experimental studies using animals or humans, the pedunculopontine tegmental nucleus (PPTn) induces muscle tone suppression, and the midbrain locomotor region (MLR) induces muscle tone promotion. PPTn and MLR modulate the activation levels of mutually antagonizing muscles such as flexors and extensors in a process through which control signals are translated from the substantia nigra reticulata to the brain stem. Therefore, we hypothesized that the PPTn and MLR could control muscle tone, that is, the maximum values of activation levels of mutually antagonizing muscles using different sigmoidal functions for each muscle; then we introduced antagonism function models (AFMs) of PPTn and MLR for individual muscles, incorporating the hypothesis into the process to determine the activation level of each muscle based on the output of the actor in ACRL. ACRL with AFMs representing the embodiment of muscle tone successfully achieved posture stabilization in five joint motions of the right arm of a human adult male under gravity in predetermined target angles at an earlier period of learning than the learning methods without AFMs. The results obtained from this study suggest that the introduction of embodiment of muscle tone can enhance learning efficiency in posture stabilization disorders of humans or humanoid robots.


Sensors ◽  
2019 ◽  
Vol 19 (21) ◽  
pp. 4794
Author(s):  
Alejandro Rodriguez-Ramos ◽  
Adrian Alvarez-Fernandez ◽  
Hriday Bavle ◽  
Pascual Campoy ◽  
Jonathan P. How

Deep- and reinforcement-learning techniques have increasingly required large sets of real data to achieve stable convergence and generalization, in the context of image-recognition, object-detection or motion-control strategies. On this subject, the research community lacks robust approaches to overcome unavailable real-world extensive data by means of realistic synthetic-information and domain-adaptation techniques. In this work, synthetic-learning strategies have been used for the vision-based autonomous following of a noncooperative multirotor. The complete maneuver was learned with synthetic images and high-dimensional low-level continuous robot states, with deep- and reinforcement-learning techniques for object detection and motion control, respectively. A novel motion-control strategy for object following is introduced where the camera gimbal movement is coupled with the multirotor motion during the multirotor following. Results confirm that our present framework can be used to deploy a vision-based task in real flight using synthetic data. It was extensively validated in both simulated and real-flight scenarios, providing proper results (following a multirotor up to 1.3 m/s in simulation and 0.3 m/s in real flights).


Sign in / Sign up

Export Citation Format

Share Document