ANALYSIS OF HIERARCHICAL LEARNING WITH REINFORCEMENT FOR THE IMPLEMENTATION OF BEHAVIORAL STRATEGIES OF INTELLIGENT AGENTS

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2020.09.pp.035-045 ◽

2020 ◽

pp. 35-45

Author(s):

Yu. V. Dubenko ◽

Ye. Ye. Dyshkant ◽

D. A. Gura

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Intelligent Agents ◽

Visual Information ◽

Transport Infrastructure ◽

Behavioral Strategies ◽

Clustering Methods ◽

Reward Function ◽

Lack Of Information ◽

Technical Vision

The paper discusses the task of evaluating the possibility of using robotic systems (intelligent agents) as a way to solve a problem of monitoring complex infrastructure objects, such as buildings, structures, bridges, roads and other transport infrastructure objects. Methods and algorithms for implementing behavioral strategies of robots, in particular, search algorithms based on decision trees, are examined. The emphasis is placed on the importance of forming the ability of robots to self-learn through reinforcement learning associated with modeling the behavior of living creatures when interacting with unknown elements of the environment. The Q-learning method is considered as one of the types of reinforcement learning that introduces the concept of action value, as well as the approach of “hierarchical reinforcement learning” and its varieties “Options Framework”, “Feudal”, “MaxQ”. The problems of determining such parameters as the value and reward function of agents (mobile robots), as well as the mandatory presence of a subsystem of technical vision, are identified in the segmentation of macro actions. Thus, the implementation of the task of segmentation of macro-actions requires improving the methodological base by applying intelligent algorithms and methods, including deep clustering methods. Improving the effectiveness of hierarchical training with reinforcement when mobile robots operate in conditions of lack of information about the monitoring object is possible by transmitting visual information in a variety of states, which will also increase the portability of experience between them in the future when performing tasks on various objects.

Download Full-text

StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/103 ◽

2018 ◽

Cited By ~ 6

Author(s):

Xiangteng He ◽

Yuxin Peng ◽

Junjie Zhao

Keyword(s):

Reinforcement Learning ◽

Visual Information ◽

Experimental Validation ◽

State Of The Art ◽

Two Stage ◽

Visual Categorization ◽

Fine Grained ◽

Reward Function ◽

Main Challenge ◽

Labor Consumption

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.

Download Full-text

Application of Deep Reinforcement Learning for Tracking Control of 3WD Omnidirectional Mobile Robot

Information Technology And Control ◽

10.5755/j01.itc.50.3.25979 ◽

2021 ◽

Vol 50 (3) ◽

pp. 507-521

Author(s):

Atif Mehmood ◽

Inam ul Hasan Shaikh ◽

Ahsan Ali

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Tracking Control ◽

Mathematical Framework ◽

Continuous Action ◽

Reward Function ◽

Omnidirectional Mobile Robot ◽

Policy Gradient ◽

The Neural Networks ◽

Optimal Action

Deep reinforcement learning, the fastest growing technique, to solve real-world complex problems by creatinga simple mathematical framework. It includes an agent, action, environment, and a reward. An agent will interactwith the environment, takes an optimal action aiming to maximize the total reward. This paper proposesthe compelling technique of deep deterministic policy gradient for solving the complex continuous actionspace of 3-wheeled omnidirectional mobile robots. Three-wheeled Omnidirectional mobile robots tracking isa difficult task because of the orientation of the wheels which makes it rotate around its own axis rather tofollow the trajectory. A deep deterministic policy gradient (DDPG) algorithm has been designed to train in environmentswith continuous action space to follow the trajectory by training the neural networks defined forthe policy and value function to maximize the reward function defined for the tracking of the trajectory. DDPGagent environment is created in the Reinforcement learning toolbox in MATLAB 2019 while for Actor and criticnetwork design deep neural network designer is used. Results are shown to illustrate the effectiveness of thetechnique with a convergence of error approximately to zero.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Deep Inverse Reinforcement Learning for Reward Function Identification in Bidding Models

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2021.3076296 ◽

2021 ◽

pp. 1-1

Author(s):

Hongye Guo ◽

Qixin Chen ◽

Qing Xia ◽

Chongqing Kang

Keyword(s):

Reinforcement Learning ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Function Identification ◽

Bidding Models

Download Full-text

Reinforcement Learning Approaches in Social Robotics

Sensors ◽

10.3390/s21041292 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1292

Author(s):

Neziha Akalin ◽

Amy Loutfi

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Social Robotics ◽

Research Field ◽

Social Robots ◽

Learning Approaches ◽

Reward Function ◽

Optimal Behavior ◽

Learning Challenges ◽

Starting Point

This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.

Download Full-text

State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

Journal of Zhejiang University SCIENCE C ◽

10.1631/jzus.c1200226 ◽

2013 ◽

Vol 14 (3) ◽

pp. 167-178 ◽

Cited By ~ 8

Author(s):

Xin Ma ◽

Ya Xu ◽

Guo-qiang Sun ◽

Li-xia Deng ◽

Yi-bin Li

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Mobile Robots ◽

Autonomous Mobile Robots

Download Full-text

Intelligent Agents with Reinforcement Learning and Fuzzy logic for Intention commitment Modeling

Sixth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2006.253731 ◽

2006 ◽

Cited By ~ 1

Author(s):

Prasanna Lokuge ◽

Damminda Alahakoon

Keyword(s):

Fuzzy Logic ◽

Reinforcement Learning ◽

Intelligent Agents

Download Full-text

Reinforcement Learning with Converging Goal Space and Binary Reward Function

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE) ◽

10.1109/case48305.2020.9249227 ◽

2020 ◽

Author(s):

Wooseok Ro ◽

Wonseok Jeon ◽

Hamid Bamshad ◽

Hyunseok Yang

Keyword(s):

Reinforcement Learning ◽

Reward Function

Download Full-text