Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Language statistical learning responds to reinforcement learning principles rooted in the striatum

PLoS Biology ◽

10.1371/journal.pbio.3001119 ◽

2021 ◽

Vol 19 (9) ◽

pp. e3001119

Author(s):

Joan Orpella ◽

Ernest Mas-Herrero ◽

Pablo Ripollés ◽

Josep Marco-Pallarés ◽

Ruth de Diego-Balaguer

Keyword(s):

Reinforcement Learning ◽

Language Learning ◽

Statistical Learning ◽

Dorsal Striatum ◽

Rule Learning ◽

Prediction Errors ◽

Neural Basis ◽

Structural Rules ◽

Learning Principles ◽

Striatal Function

Statistical learning (SL) is the ability to extract regularities from the environment. In the domain of language, this ability is fundamental in the learning of words and structural rules. In lack of reliable online measures, statistical word and rule learning have been primarily investigated using offline (post-familiarization) tests, which gives limited insights into the dynamics of SL and its neural basis. Here, we capitalize on a novel task that tracks the online SL of simple syntactic structures combined with computational modeling to show that online SL responds to reinforcement learning principles rooted in striatal function. Specifically, we demonstrate—on 2 different cohorts—that a temporal difference model, which relies on prediction errors, accounts for participants’ online learning behavior. We then show that the trial-by-trial development of predictions through learning strongly correlates with activity in both ventral and dorsal striatum. Our results thus provide a detailed mechanistic account of language-related SL and an explanation for the oft-cited implication of the striatum in SL tasks. This work, therefore, bridges the long-standing gap between language learning and reinforcement learning phenomena.

Download Full-text

Using Inductive Rule Learning Techniques to Learn Planning Domains

Communications in Computer and Information Science - Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications ◽

10.1007/978-3-319-91479-4_53 ◽

2018 ◽

pp. 642-656

Author(s):

José Á. Segura-Muros ◽

Raúl Pérez ◽

Juan Fernández-Olivares

Keyword(s):

Rule Learning ◽

Learning Techniques ◽

Inductive Rule Learning

Download Full-text

On-The-Fly Syntheziser Programming with Fuzzy Rule Learning

Entropy ◽

10.3390/e22090969 ◽

2020 ◽

Vol 22 (9) ◽

pp. 969

Author(s):

Iván Paz ◽

Àngela Nebot ◽

Francisco Mugica ◽

Enrique Romero

Keyword(s):

Real Time ◽

Cross Validation ◽

State Of The Art ◽

Rule Learning ◽

Fuzzy Rule ◽

Feature Space ◽

Maximum Volume ◽

Time Variations ◽

Fuzzy Rule Learning ◽

Inductive Rule Learning

This manuscript explores fuzzy rule learning for sound synthesizer programming within the performative practice known as live coding. In this practice, sound synthesis algorithms are programmed in real time by means of source code. To facilitate this, one possibility is to automatically create variations out of a few synthesizer presets. However, the need for real-time feedback makes existent synthesizer programmers unfeasible to use. In addition, sometimes presets are created mid-performance and as such no benchmarks exist. Inductive rule learning has shown to be effective for creating real-time variations in such a scenario. However, logical IF-THEN rules do not cover the whole feature space. Here, we present an algorithm that extends IF-THEN rules to hyperrectangles, which are used as the cores of membership functions to create a map of the input space. To generalize the rules, the contradictions are solved by a maximum volume heuristics. The user controls the novelty-consistency balance with respect to the input data using the algorithm parameters. The algorithm was evaluated in live performances and by cross-validation using extrinsic-benchmarks and a dataset collected during user tests. The model’s accuracy achieves state-of-the-art results. This, together with the positive criticism received from live coders that tested our methodology, suggests that this is a promising approach.

Download Full-text

Minimalistic Attacks: How Little it Takes to Fool Deep Reinforcement Learning Policies

IEEE Transactions on Cognitive and Developmental Systems ◽

10.1109/tcds.2020.2974509 ◽

2020 ◽

pp. 1-1

Author(s):

Xinghua Qu ◽

Zhu Sun ◽

Yew Soon Ong ◽

Abhishek Gupta ◽

Pengfei Wei

Keyword(s):

Reinforcement Learning ◽

Learning Policies

Download Full-text

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7225 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13905-13906

Author(s):

Rohan Saphal ◽

Balaraman Ravindran ◽

Dheevatsa Mudigere ◽

Sasikanth Avancha ◽

Bharat Kaul

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Multiple Models ◽

Model Parameters ◽

Continuous Control ◽

Sample Complexity ◽

Local Minima ◽

Single Model ◽

Learning Policies ◽

Reinforcement Learning Models

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches

Download Full-text

RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies

Lecture Notes in Computer Science - Computer Safety, Reliability, and Security ◽

10.1007/978-3-030-26250-1_25 ◽

2019 ◽

pp. 314-325

Author(s):

Vahid Behzadan ◽

William Hsu

Keyword(s):

Reinforcement Learning ◽

Learning Policies

Download Full-text

Mesh Based Analysis of Low Fractal Dimension Reinforcement Learning Policies

10.1109/icra48506.2021.9561874 ◽

2021 ◽

Author(s):

Sean Gillen ◽

Katie Byl

Keyword(s):

Fractal Dimension ◽

Reinforcement Learning ◽

Learning Policies

Download Full-text

An Empirical Investigation Into Deep and Shallow Rule Learning

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.689398 ◽

2021 ◽

Vol 4 ◽

Author(s):

Florian Beck ◽

Johannes Fürnkranz

Keyword(s):

Learning Algorithm ◽

State Of The Art ◽

Rule Learning ◽

Learning Rule ◽

Disjunctive Normal Form ◽

Universal Function ◽

Point Of View ◽

Positive Class ◽

Rule Sets ◽

Inductive Rule Learning

Inductive rule learning is arguably among the most traditional paradigms in machine learning. Although we have seen considerable progress over the years in learning rule-based theories, all state-of-the-art learners still learn descriptions that directly relate the input features to the target concept. In the simplest case, concept learning, this is a disjunctive normal form (DNF) description of the positive class. While it is clear that this is sufficient from a logical point of view because every logical expression can be reduced to an equivalent DNF expression, it could nevertheless be the case that more structured representations, which form deep theories by forming intermediate concepts, could be easier to learn, in very much the same way as deep neural networks are able to outperform shallow networks, even though the latter are also universal function approximators. However, there are several non-trivial obstacles that need to be overcome before a sufficiently powerful deep rule learning algorithm could be developed and be compared to the state-of-the-art in inductive rule learning. In this paper, we therefore take a different approach: we empirically compare deep and shallow rule sets that have been optimized with a uniform general mini-batch based optimization algorithm. In our experiments on both artificial and real-world benchmark data, deep rule networks outperformed their shallow counterparts, which we take as an indication that it is worth-while to devote more efforts to learning deep rule structures from data.

Download Full-text

Resource Management in a Multi-agent System by Means of Reinforcement Learning and Supervised Rule Learning

Computational Science – ICCS 2007 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72586-2_121 ◽

2007 ◽

pp. 864-871 ◽

Cited By ~ 4

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Resource Management ◽

Rule Learning ◽

Multi Agent System ◽

Agent System ◽

Multi Agent

Download Full-text

On the use of the policy gradient and Hessian in inverse reinforcement learning

Intelligenza Artificiale ◽

10.3233/ia-180011 ◽

2020 ◽

Vol 14 (1) ◽

pp. 117-150

Author(s):

Alberto Maria Metelli ◽

Matteo Pirotta ◽

Marcello Restelli

Keyword(s):

Reinforcement Learning ◽

Sequential Decision ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Continuous Domains ◽

Learning Policies ◽

Finite Domains

Reinforcement Learning (RL) is an effective approach to solve sequential decision making problems when the environment is equipped with a reward function to evaluate the agent’s actions. However, there are several domains in which a reward function is not available and difficult to estimate. When samples of expert agents are available, Inverse Reinforcement Learning (IRL) allows recovering a reward function that explains the demonstrated behavior. Most of the classic IRL methods, in addition to expert’s demonstrations, require sampling the environment to evaluate each reward function, that, in turn, is built starting from a set of engineered features. This paper is about a novel model-free IRL approach that does not require to specify a function space where to search for the expert’s reward function. Leveraging on the fact that the policy gradient needs to be zero for an optimal policy, the algorithm generates an approximation space for the reward function, in which a reward is singled out employing a second-order criterion. After introducing our approach for finite domains, we extend it to continuous ones. The empirical results, on both finite and continuous domains, show that the reward function recovered by our algorithm allows learning policies that outperform those obtained with the true reward function, in terms of learning speed.

Download Full-text