Normative Rule Extraction from Implicit Learning into Explicit Representation

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200555 ◽

2020 ◽

Author(s):

Mohd Rashdan Abdul Kadir ◽

Ali Selamat ◽

Ondrej Krejcar

Keyword(s):

Implicit Learning ◽

Learning Algorithm ◽

Rule Extraction ◽

Coordination Mechanism ◽

Autonomous Agent ◽

Q Learning ◽

State Action ◽

Speed Up ◽

Multi Agent ◽

Normative Rule

Normative multi-agent research is an alternative viewpoint in the design of adaptive autonomous agent architecture. Norms specify the standards of behaviors such as which actions or states should be achieved or avoided. The concept of norm synthesis is the process of generating useful normative rules. This study proposes a model for normative rule extraction from implicit learning, namely using the Q-learning algorithm, into explicit norm representation by implementing Dynamic Deontics and Hierarchical Knowledge Base (HKB) to synthesize useful normative rules in the form of weighted state-action pairs with deontic modality. OpenAi Gym is used to simulate the agent environment. Our proposed model is able to generate both obligative and prohibitive norms as well as deliberate and execute said norms. Results show the generated norms are best used as prior knowledge to guide agent behavior and performs poorly if not complemented by another agent coordination mechanism. Performance increases when using both obligation and prohibition norms, and in general, norms do speed up optimum policy reachability.

Download Full-text

Multi-agent cooperation Q-learning algorithm based on constrained Markov Game

Computer Science and Information Systems ◽

10.2298/csis191220009g ◽

2020 ◽

Vol 17 (2) ◽

pp. 647-664

Author(s):

Yangyang Ge ◽

Fei Zhu ◽

Wei Huang ◽

Peiyao Zhao ◽

Quan Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent System ◽

Agent System ◽

Action Function ◽

Q Learning ◽

State Action ◽

Markov Game ◽

Safety Constraints ◽

Multi Agent

Multi-Agent system has broad application in real world, whose security performance, however, is barely considered. Reinforcement learning is one of the most important methods to resolve Multi-Agent problems. At present, certain progress has been made in applying Multi-Agent reinforcement learning to robot system, man-machine match, and automatic, etc. However, in the above area, an agent may fall into unsafe states where the agent may find it difficult to bypass obstacles, to receive information from other agents and so on. Ensuring the safety of Multi-Agent system is of great importance in the above areas where an agent may fall into dangerous states that are irreversible, causing great damage. To solve the safety problem, in this paper we introduce a Multi-Agent Cooperation Q-Learning Algorithm based on Constrained Markov Game. In this method, safety constraints are added to the set of actions, and each agent, when interacting with the environment to search for optimal values, should be restricted by the safety rules, so as to obtain an optimal policy that satisfies the security requirements. Since traditional Multi-Agent reinforcement learning algorithm is no more suitable for the proposed model in this paper, a new solution is introduced for calculating the global optimum state-action function that satisfies the safety constraints. We take advantage of the Lagrange multiplier method to determine the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, under conditions that the state-action function and the constraint function are both differentiable, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text

Research of Q-learning Algorithm of Multi-agent in Micro-grid Control System Based on Probability

Proceedings of the 2016 5th International Conference on Energy and Environmental Protection (ICEEP 2016) ◽

10.2991/iceep-16.2016.54 ◽

2016 ◽

Author(s):

Xiangna Li ◽

Nan Yi ◽

Mengao Li ◽

Weifeng Xu

Keyword(s):

Control System ◽

Learning Algorithm ◽

Q Learning ◽

Multi Agent ◽

Micro Grid

Download Full-text

Online Tuning of a PID Controller with a Fuzzy Reinforcement Learning MAS for Flow Rate Control of a Desalination Unit

Electronics ◽

10.3390/electronics8020231 ◽

2019 ◽

Vol 8 (2) ◽

pp. 231 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Reinforcement Learning ◽

Flow Rate ◽

Pid Controller ◽

Hybrid Control ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Flow Rate Control ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) fuzzy reinforcement learning MAS (Multi-Agent System) approach for online tuning of a Proportional Integral Derivative (PID) controller in order to control the flow rate of a desalination unit. The PID gains are set by the Z-N method and then are adapted online through the fuzzy Q-learning MAS. The fuzzy Q-learning is introduced in each agent in order to confront with the continuous state-action space. The global state of the MAS is defined by the value of the error and the derivative of error. The MAS consists of three agents and the output signal of each agent defines the percentage change of each gain. The increment or the reduction of each gain can be in the range of 0% to 100% of its initial value. The simulation results highlight the performance of the suggested hybrid control strategy through comparison with the conventional PID controller tuned by Z-N.

Download Full-text

Adaptive Object Tracking via Multi-Angle Analysis Collaboration

Sensors ◽

10.3390/s18113606 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3606 ◽

Cited By ~ 1

Author(s):

Wanli Xue ◽

Zhiyong Feng ◽

Chao Xu ◽

Zhaopeng Meng ◽

Chengwei Zhang

Keyword(s):

Object Tracking ◽

Learning Algorithm ◽

Action Space ◽

Selection Strategy ◽

Multiple Perspectives ◽

Strategic Framework ◽

Practical Applications ◽

Q Learning ◽

State Action ◽

Speed And Accuracy

Although tracking research has achieved excellent performance in mathematical angles, it is still meaningful to analyze tracking problems from multiple perspectives. This motivation not only promotes the independence of tracking research but also increases the flexibility of practical applications. This paper presents a significant tracking framework based on the multi-dimensional state–action space reinforcement learning, termed as multi-angle analysis collaboration tracking (MACT). MACT is comprised of a basic tracking framework and a strategic framework which assists the former. Especially, the strategic framework is extensible and currently includes feature selection strategy (FSS) and movement trend strategy (MTS). These strategies are abstracted from the multi-angle analysis of tracking problems (observer’s attention and object’s motion). The content of the analysis corresponds to the specific actions in the multidimensional action space. Concretely, the tracker, regarded as an agent, is trained with Q-learning algorithm and ϵ -greedy exploration strategy, where we adopt a customized rewarding function to encourage robust object tracking. Numerous contrast experimental evaluations on the OTB50 benchmark demonstrate the effectiveness of the strategies and improvement in speed and accuracy of MACT tracker.

Download Full-text

A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800142081 ◽

2000 ◽

Vol 14 (2) ◽

pp. 243-258 ◽

Cited By ~ 9

Author(s):

V. S. Borkar

Keyword(s):

Stochastic Control ◽

Discrete Time ◽

Learning Algorithm ◽

Control Process ◽

Almost Sure Convergence ◽

Discrete State ◽

Q Learning ◽

State Action ◽

Simulation Based ◽

Action Spaces

A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker [4]. Almost sure convergence is proved under suitable conditions.

Download Full-text

Multiagent reinforcement learning using Non-Parametric Approximation

Respuestas ◽

10.22463/0122820x.1738 ◽

2018 ◽

Vol 23 (2) ◽

pp. 53-61

Author(s):

David Luviano Cruz ◽

Francesco José García Luna ◽

Luis Asunción Pérez Domínguez

Keyword(s):

Reinforcement Learning ◽

Hybrid Control ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Generation Task ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Optimal Set ◽

Parametric Approximation

This paper presents a hybrid control proposal for multi-agent systems, where the advantages of the reinforcement learning and nonparametric functions are exploited. A modified version of the Q-learning algorithm is used which will provide data training for a Kernel, this approach will provide a sub optimal set of actions to be used by the agents. The proposed algorithm is experimentally tested in a path generation task in an unknown environment for mobile robots.

Download Full-text

Episodic Self-Imitation Learning with Hindsight

Electronics ◽

10.3390/electronics9101742 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1742

Author(s):

Tianhong Dai ◽

Hengyan Liu ◽

Anil Anthony Bharath

Keyword(s):

Learning Algorithm ◽

Imitation Learning ◽

Continuous Control ◽

State Action ◽

Good State ◽

Agent Learning ◽

Comparable Performance ◽

Experience Replay ◽

Speed Up ◽

Action Spaces

Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state–action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms in several simulated robot control tasks. The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences. With the capability of solving sparse reward problems in continuous control settings, episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces, such as robot guidance and manipulation.

Download Full-text

Jamming-Resilient Wideband Cognitive Radios with Multi-Agent Reinforcement Learning

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2018070101 ◽

2018 ◽

Vol 10 (3) ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

Mohamed A. Aref ◽

Sudharman K. Jayaweera

Keyword(s):

Learning Algorithm ◽

Cognitive Radios ◽

System Model ◽

Interference Avoidance ◽

Q Learning ◽

Selection Policy ◽

Cognitive Framework ◽

Multi Agent ◽

Simulation Results ◽

The Impact

This article presents a design of a wideband autonomous cognitive radio (WACR) for anti-jamming and interference-avoidance. The proposed system model allows multiple WACRs to simultaneously operate over the same spectrum range producing a multi-agent environment. The objective of each radio is to predict and evade a dynamic jammer signal as well as avoiding transmissions of other WACRs. The proposed cognitive framework is made of two operations: sensing and transmission. Each operation is helped by its own learning algorithm based on Q-learning, but both will be experiencing the same RF environment. The simulation results indicate that the proposed cognitive anti-jamming technique has low computational complexity and significantly outperforms non-cognitive sub-band selection policy while being sufficiently robust against the impact of sensing errors.

Download Full-text

DIFFERENT LEARNING METHODOLOGIES FOR VISION-BASED NAVIGATION BEHAVIORS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800140500440x ◽

2005 ◽

Vol 19 (08) ◽

pp. 949-975 ◽

Cited By ~ 3

Author(s):

G. CICIRELLI ◽

T. D'ORAZIO ◽

A. DISTANTE

Keyword(s):

Visual Information ◽

Learning Algorithm ◽

Computational Cost ◽

Fuzzy Model ◽

System State ◽

Continuous Action ◽

Q Learning ◽

State Action ◽

Mobile Vehicle ◽

Difficult Situations

In this work the complex behavior of localizing a mobile vehicle with respect to the door of the environment and then reaching the door has been developed. The robot uses visual information to detect and recognize the door and to determine its state with respect to it. This complex task has been divided into two separate behaviors: door-recognition and door-reaching. A supervised methodology based on learning by components has been applied for recognizing the door. Learning by components allows to recognize the door also in difficult situations such as partial occlusions and besides, it makes recognition independent of viewpoint variations and scale changes. An unsupervised methodology based on reinforcement learning has been used for the door-reaching behavior, instead. The image of the door gives information about the relative position of the vehicle with respect to the door. Then the Q-learning algorithm is used to generate the optimal state-action associations. The problem of defining the state and the action sets has been addressed with the aim of producing smooth paths, of reducing the effects of visual errors during real navigation, and of keeping low the computational cost during the learning phase. A novel way to obtain a continuous action set has been introduced: it uses a fuzzy model to evaluate the system state. Experimental results in real environment show both the robustness of the door-recognition behavior and the generality of the door-reaching behavior.

Download Full-text