Automatic abstraction controller in reinforcement learning agent via automata

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

Crowd Evacuation Guidance Based on Combined Action Reinforcement Learning

Algorithms ◽

10.3390/a14010026 ◽

2021 ◽

Vol 14 (1) ◽

pp. 26

Author(s):

Yiran Xue ◽

Rui Wu ◽

Jiafeng Liu ◽

Xianglong Tang

Keyword(s):

Reinforcement Learning ◽

Guidance System ◽

Force Model ◽

Interactive Simulation ◽

Social Force ◽

Novel Approach ◽

Learning Agent ◽

Network Output ◽

Combined Action ◽

Crowd Evacuation

Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring a significant workload and a potential for errors. This paper proposed an end-to-end intelligent evacuation guidance method based on deep reinforcement learning, and designed an interactive simulation environment based on the social force model. The agent could automatically learn a scene model and path planning strategy with only scene images as input, and directly output dynamic signage information. Aiming to solve the “dimension disaster” phenomenon of the deep Q network (DQN) algorithm in crowd evacuation, this paper proposed a combined action-space DQN (CA-DQN) algorithm that grouped Q network output layer nodes according to action dimensions, which significantly reduced the network complexity and improved system practicality in complex scenes. In this paper, the evacuation guidance system is defined as a reinforcement learning agent and implemented by the CA-DQN method, which provides a novel approach for the evacuation guidance problem. The experiments demonstrate that the proposed method is superior to the static guidance method, and on par with the manually designed model method.

Download Full-text

Autonomous reinforcement learning agent for chemical vapor deposition synthesis of quantum materials

npj Computational Materials ◽

10.1038/s41524-021-00535-3 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Pankaj Rajak ◽

Aravind Krishnamoorthy ◽

Ankit Mishra ◽

Rajiv Kalia ◽

Aiichiro Nakano ◽

...

Keyword(s):

Chemical Vapor Deposition ◽

Reinforcement Learning ◽

Vapor Deposition ◽

Chemical Vapor ◽

Time Behavior ◽

Materials Synthesis ◽

Design Synthesis ◽

Learning Agent ◽

Threshold Temperatures ◽

Quantum Materials

AbstractPredictive materials synthesis is the primary bottleneck in realizing functional and quantum materials. Strategies for synthesis of promising materials are currently identified by time-consuming trial and error and there are no known predictive schemes to design synthesis parameters for materials. We use offline reinforcement learning (RL) to predict optimal synthesis schedules, i.e., a time-sequence of reaction conditions like temperatures and concentrations, for the synthesis of semiconducting monolayer MoS2 using chemical vapor deposition. The RL agent, trained on 10,000 computational synthesis simulations, learned threshold temperatures and chemical potentials for onset of chemical reactions and predicted previously unknown synthesis schedules that produce well-sulfidized crystalline, phase-pure MoS2. The model can be extended to multi-task objectives such as predicting profiles for synthesis of complex structures including multi-phase heterostructures and can predict long-time behavior of reacting systems, far beyond the domain of molecular dynamics simulations, making these predictions directly relevant to experimental synthesis.

Download Full-text

A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System

Symmetry ◽

10.3390/sym12040631 ◽

2020 ◽

Vol 12 (4) ◽

pp. 631

Author(s):

Chunyang Hu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Knowledge Transfer ◽

Large Scale ◽

Effective Control ◽

Small Scale ◽

Learning Agent ◽

Multi Agent ◽

Transfer Method ◽

Parameter Sharing

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.

Download Full-text

What Can You Do with a Rock? Affordance Extraction via Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/144 ◽

2017 ◽

Cited By ~ 9

Author(s):

Nancy Fulda ◽

Daniel Ricks ◽

Ben Murdoch ◽

David Wingate

Keyword(s):

Reinforcement Learning ◽

Computational Complexity ◽

Linear Algebra ◽

Autonomous Agents ◽

Common Knowledge ◽

Search Space ◽

Word Embeddings ◽

Knowledge Database ◽

Learning Agent ◽

Action Spaces

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance extraction is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

Download Full-text

Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning

10.5753/ctd.2020.11360 ◽

2020 ◽

Author(s):

Felipe Leno Da Silva ◽

Anna Helena Reali Costa

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Learning Process ◽

Trial And Error ◽

Knowledge Reuse ◽

Previous Knowledge ◽

Learning Methods ◽

Types Of Knowledge ◽

Learning Agent ◽

Multiagent Reinforcement Learning

Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important steps towards more flexible and broadly applicable multiagent transfer learning methods.

Download Full-text

Label Enhancement for Label Distribution Learning via Prior Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/446 ◽

2020 ◽

Author(s):

Yongbiao Gao ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Emotion Recognition ◽

Prior Knowledge ◽

Decision Process ◽

Age Estimation ◽

State Of The Art ◽

Learning Agent ◽

Label Distribution Learning ◽

Label Distribution

Label distribution learning (LDL) is a novel machine learning paradigm that gives a description degree of each label to an instance. However, most of training datasets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. We propose to use the prior knowledge to recover the label distributions. The process of recovering the label distributions from the logical labels is called label enhancement. In this paper, we formulate the label enhancement as a dynamic decision process. Thus, the label distribution is adjusted by a series of actions conducted by a reinforcement learning agent according to sequential state representations. The target state is defined by the prior knowledge. Experimental results show that the proposed approach outperforms the state-of-the-art methods in both age estimation and image emotion recognition.

Download Full-text

Deep Reinforcement Learning Agent for Playing 2D Shooting Games

International Journal of Control and Automation ◽

10.14257/ijca.2018.11.3.17 ◽

2018 ◽

Vol 11 (3) ◽

pp. 193-200 ◽

Cited By ~ 1

Author(s):

Dongcheul Lee ◽

Janise McNair

Keyword(s):

Reinforcement Learning ◽

Learning Agent

Download Full-text

AUTOMATED VULNERABILITY SEARCH IN A WEB APPLICATION BASED ON REINFORCEMENT LEARNING

CASPIAN JOURNAL Control and High Technologies ◽

10.21672/2074-1707.2021.53.1.091-097 ◽

2021 ◽

Vol 53 (1) ◽

pp. 91-97

Author(s):

OLGA N. VYBORNOVA ◽

◽

ALEKSANDER N. RYZHIKOV ◽

Keyword(s):

Reinforcement Learning ◽

Web Application ◽

Web Applications ◽

Subject Area ◽

Learning Technology ◽

Web Application Security ◽

Vulnerability Scanner ◽

Learning Agent ◽

Markov Decision ◽

Python Programming

We analyzed the urgency of the task of creating a more efficient (compared to analogues) means of automated vulnerability search based on modern technologies. We have shown the similarity of the vulnerabilities identifying process with the Markov decision-making process and justified the feasibility of using reinforcement learning technology for solving this problem. Since the analysis of the web application security is currently the highest priority and in demand, within the framework of this work, the application of the mathematical apparatus of reinforcement learning with to this subject area is considered. The mathematical model is presented, the specifics of the training and testing processes for the problem of automated vulnerability search in web applications are described. Based on an analysis of the OWASP Testing Guide, an action space and a set of environment states are identified. The characteristics of the software implementation of the proposed model are described: Q-learning is implemented in the Python programming language; a neural network was created to implement the learning policy using the tensorflow library. We demonstrated the results of the Reinforcement Learning agent on a real web application, as well as their comparison with the report of the Acunetix Vulnerability Scanner. The findings indicate that the proposed solution is promising.

Download Full-text