Averaged Soft Actor-Critic for Deep Reinforcement Learning

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.

Download Full-text

A Deep Reinforcement Learning Approach to The Ancient Indian Game - Chowka Bhara

10.36227/techrxiv.16780414 ◽

2021 ◽

Author(s):

Annapurna P Patil ◽

SANJAY RAGHAVENDRA ◽

Shruthi Srinarasi ◽

Reshma Ram

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Experimental Results ◽

Learning Approach ◽

Board Game ◽

Q Learning

<p>Reinforcement Learning (RL) is the study of how Artificial Intelligence (AI) agents learn to make their own decisions in an environment to maximize the cumulative reward received. Although there has been notable progress in the application of RL for games, the category of ancient Indian games has remained almost untouched. Chowka Bhara is one such ancient Indian board game. This work aims at developing a Q-Learning-based RL Chowka Bhara player whose strategies and methodologies are obtained from three Strategic Players viz. Fast Player, Random Player, and Balanced Player. It is observed through the experimental results that the Q-Learning Player outperforms all three Strategic Players.</p>

Download Full-text

A Deep Reinforcement Learning Approach to The Ancient Indian Game - Chowka Bhara

10.36227/techrxiv.16780414.v1 ◽

2021 ◽

Author(s):

Annapurna P Patil ◽

SANJAY RAGHAVENDRA ◽

Shruthi Srinarasi ◽

Reshma Ram

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Experimental Results ◽

Learning Approach ◽

Board Game ◽

Q Learning

Download Full-text

Reinforcement Learning for Cloud Computing Digital Library

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.105 ◽

2014 ◽

Vol 571-572 ◽

pp. 105-108

Author(s):

Lin Xu

Keyword(s):

Artificial Intelligence ◽

Cloud Computing ◽

Reinforcement Learning ◽

Digital Library ◽

Learning Algorithms ◽

Experimental Results ◽

Current Status ◽

Self Learning ◽

New Framework

This paper proposes a new framework of combining reinforcement learning with cloud computing digital library. Unified self-learning algorithms, which includes reinforcement learning, artificial intelligence and etc, have led to many essential advances. Given the current status of highly-available models, analysts urgently desire the deployment of write-ahead logging. In this paper we examine how DNS can be applied to the investigation of superblocks, and introduce the reinforcement learning to improve the quality of current cloud computing digital library. The experimental results show that the method works more efficiency.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

A Conflict-Free Routing Method for Automated Guided Vehicles Using Reinforcement Learning

2020 International Symposium on Flexible Automation ◽

10.1115/isfa2020-9620 ◽

2020 ◽

Author(s):

Taichi Chujo ◽

Kosei Nishida ◽

Tatsushi Nishi

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Algorithm ◽

Automated Guided Vehicles ◽

Transport Systems ◽

Deadlock Avoidance ◽

Q Learning ◽

Traffic Conditions ◽

Bidirectional Transport ◽

Routing Method

Abstract In a modern large-scale fabrication, hundreds of vehicles are used for transportation. Since traffic conditions are changing rapidly, the routing of automated guided vehicles (AGV) needs to be changed according to the change in traffic conditions. We propose a conflict-free routing method for AGVs using reinforcement learning in dynamic transportation. An advantage of the proposed method is that a change in the state can be obtained as an evaluation function. Therefore, the action can be selected according to the states. A deadlock avoidance method in bidirectional transport systems is developed using reinforcement learning. The effectiveness of the proposed method is demonstrated by comparing the performance with the conventional Q learning algorithm from computational results.

Download Full-text

CATEGORIZATION OF CONTINUOUS NUMERIC PERCEPTS BY MODIFIED FUZZY ART WITH Q-LEARNING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001406004557 ◽

2006 ◽

Vol 20 (02) ◽

pp. 113-127 ◽

Cited By ~ 2

Author(s):

HIROAKI UEDA ◽

HIDEAKI KIMOTO ◽

TAKESHI NARAKI ◽

KENICHI TAKAHASHI ◽

TETSUHIRO MIYAHARA

Keyword(s):

Reinforcement Learning ◽

Experimental Results ◽

New Method ◽

Agent Behavior ◽

Q Learning ◽

Fuzzy Art ◽

Efficient Learning

We propose a new method to categorize continuous numeric percepts for Q-learning, where percept vectors are classified into categories on the basis of fuzzy ART and Q-learning uses categories as states to acquire rules for agent behavior. For efficient learning, we modify fuzzy ART to reduce the number of categories without deteriorating the efficiency of reinforcement learning. In our modification, a vigilance parameter is defined for each category in order to control the size of a category and it is updated during learning. The method to update a vigilance parameter is based on category integration, which contributes to reducing the number of categories. Here, we define the similarity for any category pair to judge whether category integration should be performed or not. When two categories are integrated into a new category, a vigilance parameter for the category is calculated and categories used for integration are discarded, so that the number of categories is reduced without restricting the number of categories. Experimental results show that Q-learning with modified fuzzy ART acquires good rules for agent behavior more efficiently than Q-learning with ordinary fuzzy ART, although the number of categories generated by modified fuzzy ART is much less than that generated by ordinary one.

Download Full-text

Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

10.36227/techrxiv.15105492.v1 ◽

2021 ◽

Author(s):

Tiantian Zhang ◽

Xueqian Wang ◽

Bin Liang ◽

Bo Yuan

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Training Data ◽

Learning Ability ◽

High Dimensional ◽

Online Clustering ◽

Computational Overhead ◽

Knowledge Distillation ◽

The Stability ◽

Catastrophic Interference

The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" (a.k.a. "catastrophic forgetting") and the collapse in performance as later training is likely to overwrite and interfer with previously learned good policies. In this paper, we introduce the concept of "context" into the single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.

Download Full-text

Deep-Reinforcement Learning-Based Co-Evolution in a Predator–Prey System

Entropy ◽

10.3390/e21080773 ◽

2019 ◽

Vol 21 (8) ◽

pp. 773 ◽

Cited By ~ 2

Author(s):

Xueting Wang ◽

Jun Cheng ◽

Lei Wang

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Behavior Pattern ◽

Learning Ability ◽

Predator Prey ◽

Complex Processes ◽

Proposed Model ◽

The Stability ◽

Reasonable Behavior ◽

Evolutionary Networks

Understanding or estimating the co-evolution processes is critical in ecology, but very challenging. Traditional methods are difficult to deal with the complex processes of evolution and to predict their consequences on nature. In this paper, we use the deep-reinforcement learning algorithms to endow the organism with learning ability, and simulate their evolution process by using the Monte Carlo simulation algorithm in a large-scale ecosystem. The combination of the two algorithms allows organisms to use experiences to determine their behavior through interaction with that environment, and to pass on experience to their offspring. Our research showed that the predators’ reinforcement learning ability contributed to the stability of the ecosystem and helped predators obtain a more reasonable behavior pattern of coexistence with its prey. The reinforcement learning effect of prey on its own population was not as good as that of predators and increased the risk of extinction of predators. The inconsistent learning periods and speed of prey and predators aggravated that risk. The co-evolution of the two species had resulted in fewer numbers of their populations due to their potentially antagonistic evolutionary networks. If the learnable predators and prey invade an ecosystem at the same time, prey had an advantage. Thus, the proposed model illustrates the influence of learning mechanism on a predator–prey ecosystem and demonstrates the feasibility of predicting the behavior evolution in a predator–prey ecosystem using AI approaches.

Download Full-text

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6212 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7219-7226

Author(s):

Hangyu Mao ◽

Wulong Liu ◽

Jianye Hao ◽

Jun Luo ◽

Dong Li ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Football Player ◽

Superior Performance ◽

Human Society ◽

Packet Routing ◽

Q Learning ◽

Cognitive Consistency ◽

Challenging Tasks ◽

Multi Agent

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

Download Full-text

Multi-Agent Reinforcement Learning Based on K-Means Clustering in Multi-Robot Cooperative Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.216.75 ◽

2011 ◽

Vol 216 ◽

pp. 75-80 ◽

Cited By ~ 1

Author(s):

Chang An Liu ◽

Fei Liu ◽

Chun Yang Liu ◽

Hua Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Experimental Results ◽

Learning Ability ◽

Learning Method ◽

Q Learning ◽

State Space Explosion ◽

Multi Agent ◽

Robot Cooperation ◽

Multi Robot

To solve the curse of dimensionality problem in multi-agent reinforcement learning, a learning method based on k-means is presented in this paper. In this method, the environmental state is represented as key state factors. The state space explosion is avoided by classifying states into different clusters using k-means. The learning rate is improved by assigning different states to existent clusters, as well as corresponding strategy. Compared to traditional Q-learning, our experimental results of the multi-robot cooperation show that our scheme improves the team learning ability efficiently. Meanwhile, the cooperation efficiency can be enhanced successfully.

Download Full-text