Episodic Self-Imitation Learning with Hindsight

Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state–action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms in several simulated robot control tasks. The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences. With the capability of solving sparse reward problems in continuous control settings, episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces, such as robot guidance and manipulation.

Download Full-text

New Approach in Human-AI Interaction by Reinforcement-Imitation Learning

Applied Sciences ◽

10.3390/app11073068 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3068

Author(s):

Neda Navidi ◽

Rene Landry

Keyword(s):

Imitation Learning ◽

Sequential Decision ◽

Exploration Process ◽

State Action ◽

Reward Function ◽

Novel Approach ◽

Specific Objective ◽

Agent Learning ◽

Promising Solution ◽

Action Spaces

Reinforcement Learning (RL) provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. Imitation Learning (IL) offers a promising solution for those challenges using a teacher. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. A human teacher and an agent learner are considered in this study. The teacher takes part in the agent’s training towards dealing with the environment, tackling a specific objective, and achieving a predefined goal. This paper proposes a novel approach combining IL with different types of RL methods, namely, state-action-reward-state-action (SARSA) and Asynchronous Advantage Actor–Critic Agents (A3C), to overcome the problems of both stand-alone systems. How to effectively leverage the teacher’s feedback—be it direct binary or indirect detailed—for the agent learner to learn sequential decision-making policies is addressed. The results of this study on various OpenAI-Gym environments show that this algorithmic method can be incorporated with different combinations, and significantly decreases both human endeavors and tedious exploration process.

Download Full-text

A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800142081 ◽

2000 ◽

Vol 14 (2) ◽

pp. 243-258 ◽

Cited By ~ 9

Author(s):

V. S. Borkar

Keyword(s):

Stochastic Control ◽

Discrete Time ◽

Learning Algorithm ◽

Control Process ◽

Almost Sure Convergence ◽

Discrete State ◽

Q Learning ◽

State Action ◽

Simulation Based ◽

Action Spaces

A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker [4]. Almost sure convergence is proved under suitable conditions.

Download Full-text

Normative Rule Extraction from Implicit Learning into Explicit Representation

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200555 ◽

2020 ◽

Author(s):

Mohd Rashdan Abdul Kadir ◽

Ali Selamat ◽

Ondrej Krejcar

Keyword(s):

Implicit Learning ◽

Learning Algorithm ◽

Rule Extraction ◽

Coordination Mechanism ◽

Autonomous Agent ◽

Q Learning ◽

State Action ◽

Speed Up ◽

Multi Agent ◽

Normative Rule

Normative multi-agent research is an alternative viewpoint in the design of adaptive autonomous agent architecture. Norms specify the standards of behaviors such as which actions or states should be achieved or avoided. The concept of norm synthesis is the process of generating useful normative rules. This study proposes a model for normative rule extraction from implicit learning, namely using the Q-learning algorithm, into explicit norm representation by implementing Dynamic Deontics and Hierarchical Knowledge Base (HKB) to synthesize useful normative rules in the form of weighted state-action pairs with deontic modality. OpenAi Gym is used to simulate the agent environment. Our proposed model is able to generate both obligative and prohibitive norms as well as deliberate and execute said norms. Results show the generated norms are best used as prior knowledge to guide agent behavior and performs poorly if not complemented by another agent coordination mechanism. Performance increases when using both obligation and prohibition norms, and in general, norms do speed up optimum policy reachability.

Download Full-text

Keeping in Touch with Collaborative UAVs: A Deep Reinforcement Learning Approach

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/78 ◽

2018 ◽

Cited By ~ 3

Author(s):

Bo Yang ◽

Min Liu

Keyword(s):

Reinforcement Learning ◽

Training Procedure ◽

Message Delivery ◽

Uncertain Environments ◽

State Action ◽

Network Connection ◽

Agent Learning ◽

Multi Agent ◽

Continuous Domains ◽

Action Spaces

Effective collaborations among autonomous unmanned aerial vehicles (UAVs) rely on timely information sharing. However, the time-varying flight environment and the intermittent link connectivity pose great challenges to message delivery. In this paper, we leverage the deep reinforcement learning (DRL) technique to address the UAVs' optimal links discovery and selection problem in uncertain environments. As the multi-agent learning efficiency is constrained by the high-dimensional and continuous action spaces, we slice the whole action spaces into a number of tractable fractions to achieve efficient convergences of optimal policies in continuous domains. Moreover, for the nonstationarity issue that particularly challenges the multi-agent DRL with local perceptions, we present a multi-agent mutual sampling method that jointly interacts the intra-agent and inter-agent state-action information to stabilize and expedite the training procedure. We evaluate the proposed algorithm on the UAVs' continuous network connection task. Results show that the associated UAVs can quickly select the optimal connected links, which facilitate the UAVs' teamwork significantly.

Download Full-text

A real-time HIL control system on rotary inverted pendulum hardware platform based on double deep Q-network

Measurement and Control ◽

10.1177/00202940211000380 ◽

2021 ◽

Vol 54 (3-4) ◽

pp. 417-428

Author(s):

Yanyan Dai ◽

KiDong Lee ◽

SukGyu Lee

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Inverted Pendulum ◽

Learning Algorithm ◽

Deep Understanding ◽

Control Engineering ◽

Experience Replay ◽

Real Hardware ◽

Rotary Inverted Pendulum ◽

Reinforcement Learning Algorithm

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.

Download Full-text

Sparse Incremental Delta-Bar-Delta for System Identification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.665.643 ◽

2014 ◽

Vol 665 ◽

pp. 643-646

Author(s):

Ying Liu ◽

Yan Ye ◽

Chun Guang Li

Keyword(s):

System Identification ◽

Cost Function ◽

Learning Algorithm ◽

Learning System ◽

The Other ◽

Sparse System ◽

Speed Up ◽

Sparse System Identification ◽

The Cost ◽

Zero Attractor

Metalearning algorithm learns the base learning algorithm, targeted for improving the performance of the learning system. The incremental delta-bar-delta (IDBD) algorithm is such a metalearning algorithm. On the other hand, sparse algorithms are gaining popularity due to their good performance and wide applications. In this paper, we propose a sparse IDBD algorithm by taking the sparsity of the systems into account. Thenorm penalty is contained in the cost function of the standard IDBD, which is equivalent to adding a zero attractor in the iterations, thus can speed up convergence if the system of interest is indeed sparse. Simulations demonstrate that the proposed algorithm is superior to the competing algorithms in sparse system identification.

Download Full-text

Rib fracture detection system based on deep learning

Scientific Reports ◽

10.1038/s41598-021-03002-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Liding Yao ◽

Xiaojun Guan ◽

Xiaowei Song ◽

Yanbin Tan ◽

Chun Wang ◽

...

Keyword(s):

Deep Learning ◽

Learning Algorithm ◽

Detection System ◽

Rib Fracture ◽

Diagnostic Efficiency ◽

Fracture Detection ◽

Deep Learning Algorithm ◽

Experienced Radiologist ◽

Comparable Performance ◽

Detection And Diagnosis

AbstractRib fracture detection is time-consuming and demanding work for radiologists. This study aimed to introduce a novel rib fracture detection system based on deep learning which can help radiologists to diagnose rib fractures in chest computer tomography (CT) images conveniently and accurately. A total of 1707 patients were included in this study from a single center. We developed a novel rib fracture detection system on chest CT using a three-step algorithm. According to the examination time, 1507, 100 and 100 patients were allocated to the training set, the validation set and the testing set, respectively. Free Response ROC analysis was performed to evaluate the sensitivity and false positivity of the deep learning algorithm. Precision, recall, F1-score, negative predictive value (NPV) and detection and diagnosis were selected as evaluation metrics to compare the diagnostic efficiency of this system with radiologists. The radiologist-only study was used as a benchmark and the radiologist-model collaboration study was evaluated to assess the model’s clinical applicability. A total of 50,170,399 blocks (fracture blocks, 91,574; normal blocks, 50,078,825) were labelled for training. The F1-score of the Rib Fracture Detection System was 0.890 and the precision, recall and NPV values were 0.869, 0.913 and 0.969, respectively. By interacting with this detection system, the F1-score of the junior and the experienced radiologists had improved from 0.796 to 0.925 and 0.889 to 0.970, respectively; the recall scores had increased from 0.693 to 0.920 and 0.853 to 0.972, respectively. On average, the diagnosis time of radiologist assisted with this detection system was reduced by 65.3 s. The constructed Rib Fracture Detection System has a comparable performance with the experienced radiologist and is readily available to automatically detect rib fracture in the clinical setting with high efficacy, which could reduce diagnosis time and radiologists’ workload in the clinical practice.

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

Implementation of modified SARSA learning technique in EMCAP

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.5.9161 ◽

2017 ◽

Vol 7 (1.5) ◽

pp. 274

Author(s):

D. Ganesha ◽

Vijayakumar Maragal Venkatamuni

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Decision Process ◽

Learning Algorithm ◽

Research Work ◽

Learning System ◽

State Action ◽

Learning Technique ◽

Markov Decision ◽

Experiment Analysis

This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm. State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards. Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified SARSA learning algorithm can be more suitable in EMCAP architecture. The experiments are conducted the modified SARSA Learning system gets more rewards compare to existing SARSA algorithm.

Download Full-text

ACCELERATED LEARNING BY ACTIVE EXAMPLE SELECTION

International Journal of Neural Systems ◽

10.1142/s0129065794000086 ◽

1994 ◽

Vol 05 (01) ◽

pp. 67-75 ◽

Cited By ~ 32

Author(s):

BYOUNG-TAK ZHANG

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithm ◽

Accelerated Learning ◽

Training Set ◽

Alternative Approach ◽

Speed Up ◽

Multilayer Neural Networks ◽

Training Examples ◽

The Given

Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.

Download Full-text