Robot arm reaching through neural inversions and reinforcement learning

2000 ◽  
Vol 31 (4) ◽  
pp. 227-246 ◽  
Author(s):  
Pedro Martı́n ◽  
José del R. Millán
2021 ◽  
pp. 1-1
Author(s):  
Reshma Kar ◽  
Lidia Ghosh ◽  
Amit Konar ◽  
Aruna Chakraborty ◽  
Atulya K. Nagar

2013 ◽  
Vol 25 (12) ◽  
pp. 3263-3293 ◽  
Author(s):  
Samuel A. Neymotin ◽  
George L. Chadderdon ◽  
Cliff C. Kerr ◽  
Joseph T. Francis ◽  
William W. Lytton

Neocortical mechanisms of learning sensorimotor control involve a complex series of interactions at multiple levels, from synaptic mechanisms to cellular dynamics to network connectomics. We developed a model of sensory and motor neocortex consisting of 704 spiking model neurons. Sensory and motor populations included excitatory cells and two types of interneurons. Neurons were interconnected with AMPA/NMDA and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a two-joint virtual arm to reach to a fixed target. For each of 125 trained networks, we used 200 training sessions, each involving 15 s reaches to the target from 16 starting positions. Learning altered network dynamics, with enhancements to neuronal synchrony and behaviorally relevant information flow between neurons. After learning, networks demonstrated retention of behaviorally relevant memories by using proprioceptive information to perform reach-to-target from multiple starting positions. Networks dynamically controlled which joint rotations to use to reach a target, depending on current arm position. Learning-dependent network reorganization was evident in both sensory and motor populations: learned synaptic weights showed target-specific patterning optimized for particular reach movements. Our model embodies an integrative hypothesis of sensorimotor cortical learning that could be used to interpret future electrophysiological data recorded in vivo from sensorimotor learning experiments. We used our model to make the following predictions: learning enhances synchrony in neuronal populations and behaviorally relevant information flow across neuronal populations, enhanced sensory processing aids task-relevant motor performance and the relative ease of a particular movement in vivo depends on the amount of sensory information required to complete the movement.


2012 ◽  
Vol 13 (S1) ◽  
Author(s):  
Samuel A Neymotin ◽  
George L Chadderdon ◽  
Cliff C Kerr ◽  
Joseph T Francis ◽  
William W Lytton

2020 ◽  
Vol 34 (04) ◽  
pp. 5717-5725
Author(s):  
Craig Sherstan ◽  
Shibhansh Dohare ◽  
James MacGlashan ◽  
Johannes Günther ◽  
Patrick M. Pilarski

Temporal abstraction is a key requirement for agents making decisions over long time horizons—a fundamental challenge in reinforcement learning. There are many reasons why value estimates at multiple timescales might be useful; recent work has shown that value estimates at different time scales can be the basis for creating more advanced discounting functions and for driving representation learning. Further, predictions at many different timescales serve to broaden an agent's model of its environment. One predictive approach of interest within an online learning setting is general value function (GVFs), which represent models of an agent's world as a collection of predictive questions each defined by a policy, a signal to be predicted, and a prediction timescale. In this paper we present Γ-nets, a method for generalizing value function estimation over timescale, allowing a given GVF to be trained and queried for arbitrary timescales so as to greatly increase the predictive ability and scalability of a GVF-based model. The key to our approach is to use timescale as one of the value estimator's inputs. As a result, the prediction target for any timescale is available at every timestep and we are free to train on any number of timescales. We first provide two demonstrations by 1) predicting a square wave and 2) predicting sensorimotor signals on a robot arm using a linear function approximator. Next, we empirically evaluate Γ-nets in the deep reinforcement learning setting using policy evaluation on a set of Atari video games. Our results show that Γ-nets can be effective for predicting arbitrary timescales, with only a small cost in accuracy as compared to learning estimators for fixed timescales. Γ-nets provide a method for accurately and compactly making predictions at many timescales without requiring a priori knowledge of the task, making it a valuable contribution to ongoing work on model-based planning, representation learning, and lifelong learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document