learning from demonstration
Recently Published Documents


TOTAL DOCUMENTS

236
(FIVE YEARS 82)

H-INDEX

19
(FIVE YEARS 2)

2021 ◽  
Vol 33 (5) ◽  
pp. 1063-1074
Author(s):  
Kei Kase ◽  
Noboru Matsumoto ◽  
Tetsuya Ogata ◽  
◽  

Deep robotic learning by learning from demonstration allows robots to mimic a given demonstration and generalize their performance to unknown task setups. However, this generalization ability is heavily affected by the number of demonstrations, which can be costly to manually generate. Without sufficient demonstrations, robots tend to overfit to the available demonstrations and lose the robustness offered by deep learning. Applying the concept of motor babbling – a process similar to that by which human infants move their bodies randomly to obtain proprioception – is also effective for allowing robots to enhance their generalization ability. Furthermore, the generation of babbling data is simpler than task-oriented demonstrations. Previous researches use motor babbling in the concept of pre-training and fine-tuning but have the problem of the babbling data being overwritten by the task data. In this work, we propose an RNN-based robot-control framework capable of leveraging targetless babbling data to aid the robot in acquiring proprioception and increasing the generalization ability of the learned task data by learning both babbling and task data simultaneously. Through simultaneous learning, our framework can use the dynamics obtained from babbling data to learn the target task efficiently. In the experiment, we prepare demonstrations of a block-picking task and aimless-babbling data. With our framework, the robot can learn tasks faster and show greater generalization ability when blocks are at unknown positions or move during execution.


2021 ◽  
pp. 027836492110462
Author(s):  
Lin Shao ◽  
Toki Migimatsu ◽  
Qiang Zhang ◽  
Karen Yang ◽  
Jeannette Bohg

We aim to endow a robot with the ability to learn manipulation concepts that link natural language instructions to motor skills. Our goal is to learn a single multi-task policy that takes as input a natural language instruction and an image of the initial scene and outputs a robot motion trajectory to achieve the specified task. This policy has to generalize over different instructions and environments. Our insight is that we can approach this problem through learning from demonstration by leveraging large-scale video datasets of humans performing manipulation actions. Thereby, we avoid more time-consuming processes such as teleoperation or kinesthetic teaching. We also avoid having to manually design task-specific rewards. We propose a two-stage learning process where we first learn single-task policies through reinforcement learning. The reward is provided by scoring how well the robot visually appears to perform the task. This score is given by a video-based action classifier trained on a large-scale human activity dataset. In the second stage, we train a multi-task policy through imitation learning to imitate all the single-task policies. In extensive simulation experiments, we show that the multi-task policy learns to perform a large percentage of the 78 different manipulation tasks on which it was trained. The tasks are of greater variety and complexity than previously considered robot manipulation tasks. We show that the policy generalizes over variations of the environment. We also show examples of successful generalization over novel but similar instructions.


Author(s):  
Mingfei Sun ◽  
Zhenhui Peng ◽  
Meng Xia ◽  
Xiaojuan Ma

AbstractRobot learning from demonstration (RLfD) is a technique for robots to derive policies from instructors’ examples. Although the reciprocal effects of student engagement on teacher behavior are widely recognized in the educational community, it is unclear whether the same phenomenon holds for RLfD. To fill this gap, we first design three types of robot engagement behavior (gaze, imitation, and a hybrid of the two) based on the learning literature. We then conduct, in a simulation environment, a within-subject user study to investigate the impact of different robot engagement cues on humans compared to a “without-engagement” condition. Results suggest that engagement communication has significantly negative influences on the human’s estimation of the simulated robots’ capability and significantly raises their expectation towards the learning outcomes, even though we do not run actual imitation learning algorithms in the experiments. Moreover, imitation behavior affects humans more than gaze does in all metrics, while their combination has the most profound influences on humans. We also find that communicating engagement via imitation or the combined behavior significantly improves humans’ perception towards the quality of simulated demonstrations, even if all demonstrations are of the same quality.


2021 ◽  
Author(s):  
Yunlei Shi ◽  
Zhaopeng Chen ◽  
Yansong Wu ◽  
Dimitri Henkel ◽  
Sebastian Riedel ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Yichuan Zhang ◽  
Yixing Lan ◽  
Qiang Fang ◽  
Xin Xu ◽  
Junxiang Li ◽  
...  

Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy’s confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods.


2021 ◽  
pp. 027836492110405
Author(s):  
Emmanuel Pignat ◽  
Joāo Silvério ◽  
Sylvain Calinon

Probability distributions are key components of many learning from demonstration (LfD) approaches, with the spaces chosen to represent tasks playing a central role. Although the robot configuration is defined by its joint angles, end-effector poses are often best explained within several task spaces. In many approaches, distributions within relevant task spaces are learned independently and only combined at the control level. This simplification implies several problems that are addressed in this work. We show that the fusion of models in different task spaces can be expressed as products of experts (PoE), where the probabilities of the models are multiplied and renormalized so that it becomes a proper distribution of joint angles. Multiple experiments are presented to show that learning the different models jointly in the PoE framework significantly improves the quality of the final model. The proposed approach particularly stands out when the robot has to learn hierarchical objectives that arise when a task requires the prioritization of several sub-tasks (e.g. in a humanoid robot, keeping balance has a higher priority than reaching for an object). Since training the model jointly usually relies on contrastive divergence, which requires costly approximations that can affect performance, we propose an alternative strategy using variational inference and mixture model approximations. In particular, we show that the proposed approach can be extended to PoE with a nullspace structure (PoENS), where the model is able to recover secondary tasks that are masked by the resolution of tasks of higher-importance.


Sign in / Sign up

Export Citation Format

Share Document