scholarly journals Reinforcement meta-learning optimizes visuomotor learning

Author(s):  
Taisei Sugiyama ◽  
Nicolas Schweighofer ◽  
Jun Izawa

AbstractReinforcement learning enables the brain to learn optimal action selection, such as go or not go, by forming state-action and action-outcome associations. Does this mechanism also optimize the brain’s willingness to learn, such as learn or not learn? Learning to learn by rewards, i.e., reinforcement meta-learning, is a crucial mechanism for machines to develop flexibility in learning, which is also considered in the brain without empirical examinations. Here, we show that humans learn to learn or not learn to maximize rewards in visuomotor learning tasks. We also show that this regulation of learning is not a motivational bias but is a result of an instrumental, active process, which takes into account the learning-outcome structure. Our results thus demonstrate the existence of reinforcement meta-learning in the human brain. Because motor learning is a process of minimizing sensory errors, our findings uncover an essential mechanism of interaction between reward and error.

Author(s):  
Hadi S. Jomaa ◽  
Lars Schmidt-Thieme ◽  
Josif Grabocka

AbstractMeta-learning, or learning to learn, is a machine learning approach that utilizes prior learning experiences to expedite the learning process on unseen tasks. As a data-driven approach, meta-learning requires meta-features that represent the primary learning tasks or datasets, and are estimated traditonally as engineered dataset statistics that require expert domain knowledge tailored for every meta-task. In this paper, first, we propose a meta-feature extractor called Dataset2Vec that combines the versatility of engineered dataset meta-features with the expressivity of meta-features learned by deep neural networks. Primary learning tasks or datasets are represented as hierarchical sets, i.e., as a set of sets, esp. as a set of predictor/target pairs, and then a DeepSet architecture is employed to regress meta-features on them. Second, we propose a novel auxiliary meta-learning task with abundant data called dataset similarity learning that aims to predict if two batches stem from the same dataset or different ones. In an experiment on a large-scale hyperparameter optimization task for 120 UCI datasets with varying schemas as a meta-learning task, we show that the meta-features of Dataset2Vec outperform the expert engineered meta-features and thus demonstrate the usefulness of learned meta-features for datasets with varying schemas for the first time.


2020 ◽  
Author(s):  
Aman Gupta ◽  
Yadul Raghav

Meta-Learning, the ability of learning to learn, helps to train a model to learn very quickly on a variety of learning tasks; adapting to any new environment with a minimal number of examples allows us to speed up the performance and training of the model. It solves the traditional machine learning paradigm problem, where it needed a vast dataset to learn any task to train the model from scratch. Much work has already been done on meta-learning in various learning environments, including reinforcement learning, regression task, classification task with image, and other datasets, but it is yet to be explored with the time-series domain. In this work, we aimed to understand the effectiveness of meta-learning algorithms in time series classification task with multivariate time-series datasets. We present the algorithm’s performance on the time series archive, where the result shows that using meta-learning algorithms leads to faster convergence with fewer iteration over the non-meta-learning equivalent.


Entropy ◽  
2021 ◽  
Vol 23 (1) ◽  
pp. 126
Author(s):  
Sharu Theresa Jose ◽  
Osvaldo Simeone

Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Yihui Quek ◽  
Stanislav Fort ◽  
Hui Khoon Ng

AbstractCurrent algorithms for quantum state tomography (QST) are costly both on the experimental front, requiring measurement of many copies of the state, and on the classical computational front, needing a long time to analyze the gathered data. Here, we introduce neural adaptive quantum state tomography (NAQT), a fast, flexible machine-learning-based algorithm for QST that adapts measurements and provides orders of magnitude faster processing while retaining state-of-the-art reconstruction accuracy. As in other adaptive QST schemes, measurement adaptation makes use of the information gathered from previous measured copies of the state to perform a targeted sensing of the next copy, maximizing the information gathered from that next copy. Our NAQT approach allows for a rapid and seamless integration of measurement adaptation and statistical inference, using a neural-network replacement of the standard Bayes’ update, to obtain the best estimate of the state. Our algorithm, which falls into the machine learning subfield of “meta-learning” (in effect “learning to learn” about quantum states), does not require any ansatz about the form of the state to be estimated. Despite this generality, it can be retrained within hours on a single laptop for a two-qubit situation, which suggests a feasible time-cost when extended to larger systems and potential speed-ups if provided with additional structure, such as a state ansatz.


Author(s):  
Lin Lan ◽  
Zhenguo Li ◽  
Xiaohong Guan ◽  
Pinghui Wang

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.


2021 ◽  
Author(s):  
Francesco Poli ◽  
Tommaso Ghilardi ◽  
Rogier B. Mars ◽  
Max Hinne ◽  
Sabine Hunnius

Infants learn to navigate the complexity of the physical and social world at an outstanding pace, but how they accomplish this learning is still unknown. Recent advances in human and artificial intelligence research propose that a key feature to achieve quick and efficient learning is meta-learning, the ability to make use of prior experiences to optimize how future information is acquired. Here we show that 8-month-old infants successfully engage in meta-learning within very short timespans. We developed a Bayesian model that captures how infants attribute informativity to incoming events, and how this process is optimized by the meta-parameters of their hierarchical models over the task structure. We fitted the model using infants’ gaze behaviour during a learning task. Our results reveal that infants do not simply accumulate experiences, but actively use them to generate new inductive biases that allow learning to proceed faster in the future.


2018 ◽  
Author(s):  
Anna C Sales ◽  
Karl J. Friston ◽  
Matthew W. Jones ◽  
Anthony E. Pickering ◽  
Rosalyn J. Moran

AbstractThe locus coeruleus (LC) in the pons is the major source of noradrenaline (NA) in the brain. Two modes of LC firing have been associated with distinct cognitive states: changes in tonic rates of firing are correlated with global levels of arousal and behavioural flexibility, whilst phasic LC responses are evoked by salient stimuli. Here, we unify these two modes of firing by modelling the response of the LC as a correlate of a prediction error when inferring states for action planning under Active Inference (AI).We simulate a classic Go/No-go reward learning task and a three-arm foraging task and show that, if LC activity is considered to reflect the magnitude of high level ‘state-action’ prediction errors, then both tonic and phasic modes of firing are emergent features of belief updating. We also demonstrate that when contingencies change, AI agents can update their internal models more quickly by feeding back this state-action prediction error – reflected in LC firing and noradrenaline release – to optimise learning rate, enabling large adjustments over short timescales. We propose that such prediction errors are mediated by cortico-LC connections, whilst ascending input from LC to cortex modulates belief updating in anterior cingulate cortex (ACC).In short, we characterise the LC/ NA system within a general theory of brain function. In doing so, we show that contrasting, behaviour-dependent firing patterns are an emergent property of the LC’s crucial role in translating prediction errors into an optimal mediation between plasticity and stability.Author SummaryThe brain uses sensory information to build internal models and make predictions about the world. When errors of prediction occur, models must be updated to ensure desired outcomes are still achieved. Neuromodulator chemicals provide a possible pathway for triggering such changes in brain state. One such neuromodulator, noradrenaline, originates predominantly from a cluster of neurons in the brainstem – the locus coeruleus (LC) – and plays a key role in behaviour, for instance, in determining the balance between exploiting or exploring the environment.Here we use Active Inference (AI), a mathematical model of perception and action, to formally describe LC function. We propose that LC activity is triggered by errors in prediction and that the subsequent release of noradrenaline alters the rate of learning about the environment. Biologically, this describes an LC-cortex feedback loop promoting behavioural flexibility in times of uncertainty. We model LC output as a simulated animal performs two tasks known to elicit archetypal responses. We find that experimentally observed ‘phasic’ and ‘tonic’ patterns of LC activity emerge naturally, and that modulation of learning rates improves task performance. This provides a simple, unified computational account of noradrenergic computational function within a general model of behaviour.


Sign in / Sign up

Export Citation Format

Share Document