Reinforcement learning for robotic manipulation using simulated locomotion demonstrations

Machine Learning ◽

10.1007/s10994-021-06116-1 ◽

2021 ◽

Author(s):

Ozsel Kilinc ◽

Giovanni Montana

Keyword(s):

Reinforcement Learning ◽

Space Exploration ◽

Approaches To Learning ◽

Robotic Manipulation ◽

Rigid Object ◽

State Action ◽

Learning Rates ◽

Recent Developments ◽

Manipulation Task ◽

Reward Functions

AbstractMastering robotic manipulation skills through reinforcement learning (RL) typically requires the design of shaped reward functions. Recent developments in this area have demonstrated that using sparse rewards, i.e. rewarding the agent only when the task has been successfully completed, can lead to better policies. However, state-action space exploration is more difficult in this case. Recent RL approaches to learning with sparse rewards have leveraged high-quality human demonstrations for the task, but these can be costly, time consuming or even impossible to obtain. In this paper, we propose a novel and effective approach that does not require human demonstrations. We observe that every robotic manipulation task could be seen as involving a locomotion task from the perspective of the object being manipulated, i.e. the object could learn how to reach a target state on its own. In order to exploit this idea, we introduce a framework whereby an object locomotion policy is initially obtained using a realistic physics simulator. This policy is then used to generate auxiliary rewards, called simulated locomotion demonstration rewards (SLDRs), which enable us to learn the robot manipulation policy. The proposed approach has been evaluated on 13 tasks of increasing complexity, and can achieve higher success rate and faster learning rates compared to alternative algorithms. SLDRs are especially beneficial for tasks like multi-object stacking and non-rigid object manipulation.

Download Full-text

Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences

Symmetry ◽

10.3390/sym12101685 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1685 ◽

Cited By ~ 1

Author(s):

Chayoung Kim

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Real Time ◽

Temporal Difference ◽

Q Learning ◽

State Action ◽

Proposed Model ◽

Reward Functions ◽

And Performance ◽

The Internet Of Things

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.

Download Full-text

Asymmetric and adaptive reward coding via normalized reinforcement learning

10.1101/2021.11.24.469880 ◽

2021 ◽

Author(s):

Kenway Louie

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Computational Models ◽

Nonlinear Function ◽

Value Functions ◽

Learning Rates ◽

Error Coding ◽

Asymmetric Learning ◽

Reward Functions ◽

The Brain

Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. Here, we show that nonlinear RL incorporating the canonical divisive normalization computation introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in risk preferences typically attributed to asymmetric learning rates. At the neural level, diversity in asymmetries provides a computational mechanism for recently proposed theories of distributional RL, allowing the brain to learn the full probability distribution of future rewards. This behavioral and computational flexibility argues for an incorporation of biologically valid value functions in computational models of learning and decision-making.

Download Full-text

Improving state-action space exploration in reinforcement learning using geometric properties

2017 IEEE 56th Annual Conference on Decision and Control (CDC) ◽

10.1109/cdc.2017.8264625 ◽

2017 ◽

Author(s):

Ion Matei ◽

Raj Minhas ◽

Johan de Kleer ◽

Anurag Ganguli

Keyword(s):

Reinforcement Learning ◽

Space Exploration ◽

Action Space ◽

Geometric Properties ◽

State Action

Download Full-text

A Comparative Study of AI-Based Intrusion Detection Techniques in Critical Infrastructures

ACM Transactions on Internet Technology ◽

10.1145/3406093 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-22

Author(s):

Safa Otoum ◽

Burak Kantarci ◽

Hussein Mouftah

Keyword(s):

Reinforcement Learning ◽

Intrusion Detection ◽

Comparative Study ◽

Performance Metrics ◽

Action Learning ◽

Smart Devices ◽

Critical Infrastructures ◽

State Action ◽

Detection Techniques ◽

Depth Analysis

Volunteer computing uses Internet-connected devices (laptops, PCs, smart devices, etc.), in which their owners volunteer them as storage and computing power resources, has become an essential mechanism for resource management in numerous applications. The growth of the volume and variety of data traffic on the Internet leads to concerns on the robustness of cyberphysical systems especially for critical infrastructures. Therefore, the implementation of an efficient Intrusion Detection System for gathering such sensory data has gained vital importance. In this article, we present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognise intrusive behavior in the collected traffic. We evaluate the proposed mechanisms by using KDD’99 as real attack dataset in our simulations. Results present the performance metrics for three different IDSs, namely the Adaptively Supervised and Clustered Hybrid IDS (ASCH-IDS), Restricted Boltzmann Machine-based Clustered IDS (RBC-IDS), and Q-learning based IDS (Q-IDS), to detect malicious behaviors. We also present the performance of different reinforcement learning techniques such as State-Action-Reward-State-Action Learning (SARSA) and the Temporal Difference learning (TD). Through simulations, we show that Q-IDS performs with detection rate while SARSA-IDS and TD-IDS perform at the order of .

Download Full-text

On the Role of Reward Functions for Reinforcement Learning in the Traffic Assignment Problem

2020 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn48605.2020.9207403 ◽

2020 ◽

Author(s):

Ricardo Grunitzki ◽

Gabriel de Oliveira Ramos

Keyword(s):

Reinforcement Learning ◽

Assignment Problem ◽

Traffic Assignment ◽

Traffic Assignment Problem ◽

Reward Functions

Download Full-text

Securitisation gaps: Towards ideational understandings of state weakness

European Journal of International Security ◽

10.1017/eis.2021.13 ◽

2021 ◽

pp. 1-20

Author(s):

Kevork Oskanian

Keyword(s):

The State ◽

Collective Knowledge ◽

State Action ◽

Role Of The State ◽

Wide Range ◽

State And Society ◽

Interpretive Approach ◽

Recent Developments ◽

The Republic

Abstract This article contributes a securitisation-based, interpretive approach to state weakness. The long-dominant positivist approaches to the phenomenon have been extensively criticised for a wide range of deficiencies. Responding to Lemay-Hébert's suggestion of a ‘Durkheimian’, ideational-interpretive approach as a possible alternative, I base my conceptualisation on Migdal's view of state weakness as emerging from a ‘state-in-society's’ contested ‘strategies of survival’. I argue that several recent developments in Securitisation Theory enable it to capture this contested ‘collective knowledge’ on the state: a move away from state-centrism, the development of a contextualised ‘sociological’ version, linkages made between securitisation and legitimacy, and the acknowledgment of ‘securitisations’ as a contested Bourdieusian field. I introduce the concept of ‘securitisation gaps’ – divergences in the security discourses and practices of state and society – as a concept aimed at capturing this contested role of the state, operationalised along two logics (reactive/substitutive) – depending on whether they emerge from securitisations of the state action or inaction – and three intensities (latent, manifest, and violent), depending on the extent to which they involve challenges to state authority. The approach is briefly illustrated through the changing securitisation gaps in the Republic of Lebanon during the 2019–20 ‘October Uprising’.

Download Full-text

State Action Separable Reinforcement Learning

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9377951 ◽

2020 ◽

Author(s):

Ziyao Zhang ◽

Liang Ma ◽

Kin K. Leung ◽

Konstantinos Poularakis ◽

Mudhakar Srivatsa

Keyword(s):

Reinforcement Learning ◽

State Action

Download Full-text

Hierarchical Reinforcement Learning With Universal Policies for Multistep Robotic Manipulation

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3059912 ◽

2021 ◽

pp. 1-15

Author(s):

Xintong Yang ◽

Ze Ji ◽

Jing Wu ◽

Yu-Kun Lai ◽

Changyun Wei ◽

...

Keyword(s):

Reinforcement Learning ◽

Robotic Manipulation ◽

Hierarchical Reinforcement Learning

Download Full-text

A nonlinear relationship between prediction errors and learning rates in human reinforcement learning

10.1101/751222 ◽

2019 ◽

Author(s):

Erdem Pulcu

Keyword(s):

Reinforcement Learning ◽

Nonlinear Relationship ◽

Prediction Errors ◽

Learning Rates ◽

The Face ◽

In The Wild ◽

Actual Outcome ◽

Update Rules ◽

Different Sources

AbstractWe are living in a dynamic world in which stochastic relationships between cues and outcome events create different sources of uncertainty1 (e.g. the fact that not all grey clouds bring rain). Living in an uncertain world continuously probes learning systems in the brain, guiding agents to make better decisions. This is a type of value-based decision-making which is very important for survival in the wild and long-term evolutionary fitness. Consequently, reinforcement learning (RL) models describing cognitive/computational processes underlying learning-based adaptations have been pivotal in behavioural2,3 and neural sciences4–6, as well as machine learning7,8. This paper demonstrates the suitability of novel update rules for RL, based on a nonlinear relationship between prediction errors (i.e. difference between the agent’s expectation and the actual outcome) and learning rates (i.e. a coefficient with which agents update their beliefs about the environment), that can account for learning-based adaptations in the face of environmental uncertainty. These models illustrate how learners can flexibly adapt to dynamically changing environments.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text