A Novel Approach to Multiagent Reinforcement Learning: Utilizing OLAP Mining in the Learning Process

Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important steps towards more flexible and broadly applicable multiagent transfer learning methods.

Download Full-text

Coordinated Learning by Model Difference Identification in Multiagent Systems with Sparse Interactions

Discrete Dynamics in Nature and Society ◽

10.1155/2016/3207460 ◽

2016 ◽

Vol 2016 ◽

pp. 1-17

Author(s):

Qi Zhang ◽

Peng Jiao ◽

Quanjun Yin ◽

Lin Sun

Keyword(s):

Reinforcement Learning ◽

Multiagent Systems ◽

Learning Process ◽

Independent Learning ◽

Promising Technique ◽

Joint Learning ◽

State Action ◽

Multiagent Reinforcement Learning ◽

General Mass ◽

Coordinated Learning

Multiagent Reinforcement Learning (MARL) is a promising technique for agents learning effective coordinated policy in Multiagent Systems (MASs). In many MASs, interactions between agents are usually sparse, and then a lot of MARL methods were devised for them. These methods divide learning process into independent learning and joint learning in coordinated states to improve traditional joint state-action space learning. However, most of those methods identify coordinated states based on assumptions about domain structure (e.g., dependencies) or agent (e.g., prior individual optimal policy and agent homogeneity). Moreover, situations that current methods cannot deal with still exist. In this paper, a modified approach is proposed to learn where and how to coordinate agents’ behaviors in more general MASs with sparse interactions. Our approach introduces sample grouping and a more accurate metric of model difference degree to identify which states of other agents should be considered in coordinated states, without strong additional assumptions. Experimental results show that the proposed approach outperforms its competitors by improving the average agent reward per step and works well in some broader scenarios.

Download Full-text

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/891 ◽

2019 ◽

Author(s):

Ritesh Noothigattu ◽

Djallel Bouneffouf ◽

Nicholas Mattei ◽

Rachita Chandra ◽

Piyush Madan ◽

...

Keyword(s):

Reinforcement Learning ◽

Ethical Values ◽

Large Role ◽

Learning To Learn ◽

Inverse Reinforcement Learning ◽

Time Step ◽

Novel Approach

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

Download Full-text

Novel Approach for Enhanced Teaching Learning Process

SSRN Electronic Journal ◽

10.2139/ssrn.3514137 ◽

2020 ◽

Author(s):

Safad Ismail

Keyword(s):

Learning Process ◽

Novel Approach ◽

Teaching Learning

Download Full-text

A Novel Approach to Feedback Control with Deep Reinforcement Learning

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2018.09.241 ◽

2018 ◽

Vol 51 (18) ◽

pp. 31-36 ◽

Cited By ~ 3

Author(s):

Yuan Wang ◽

Kirubakaran Velswamy ◽

Biao Huang

Keyword(s):

Reinforcement Learning ◽

Feedback Control ◽

Novel Approach

Download Full-text

Opponent portrait for multiagent reinforcement learning in competitive environment

International Journal of Intelligent Systems ◽

10.1002/int.22594 ◽

2021 ◽

Author(s):

Yuxi Ma ◽

Meng Shen ◽

Yuhang Zhao ◽

Zhao Li ◽

Xiaoyao Tong ◽

...

Keyword(s):

Reinforcement Learning ◽

Competitive Environment ◽

Multiagent Reinforcement Learning

Download Full-text

Crowd Evacuation Guidance Based on Combined Action Reinforcement Learning

Algorithms ◽

10.3390/a14010026 ◽

2021 ◽

Vol 14 (1) ◽

pp. 26

Author(s):

Yiran Xue ◽

Rui Wu ◽

Jiafeng Liu ◽

Xianglong Tang

Keyword(s):

Reinforcement Learning ◽

Guidance System ◽

Force Model ◽

Interactive Simulation ◽

Social Force ◽

Novel Approach ◽

Learning Agent ◽

Network Output ◽

Combined Action ◽

Crowd Evacuation

Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring a significant workload and a potential for errors. This paper proposed an end-to-end intelligent evacuation guidance method based on deep reinforcement learning, and designed an interactive simulation environment based on the social force model. The agent could automatically learn a scene model and path planning strategy with only scene images as input, and directly output dynamic signage information. Aiming to solve the “dimension disaster” phenomenon of the deep Q network (DQN) algorithm in crowd evacuation, this paper proposed a combined action-space DQN (CA-DQN) algorithm that grouped Q network output layer nodes according to action dimensions, which significantly reduced the network complexity and improved system practicality in complex scenes. In this paper, the evacuation guidance system is defined as a reinforcement learning agent and implemented by the CA-DQN method, which provides a novel approach for the evacuation guidance problem. The experiments demonstrate that the proposed method is superior to the static guidance method, and on par with the manually designed model method.

Download Full-text

Experience Sharing Based Memetic Transfer Learning for Multiagent Reinforcement Learning

Memetic Computing ◽

10.1007/s12293-021-00339-4 ◽

2021 ◽

Author(s):

Tonghao Wang ◽

Xingguang Peng ◽

Yaochu Jin ◽

Demin Xu

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Multiagent Reinforcement Learning

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text

Application of Deep Reinforcement Learning to Predict Shaft Deformation Considering Hull Deformation of Medium-Sized Oil/Chemical Tanker

Journal of Marine Science and Engineering ◽

10.3390/jmse9070767 ◽

2021 ◽

Vol 9 (7) ◽

pp. 767

Author(s):

Shin-Pyo Choi ◽

Jae-Ung Lee ◽

Jun-Bum Park

Keyword(s):

Reinforcement Learning ◽

Reaction Force ◽

Measurement Data ◽

System Stability ◽

Main Bearing ◽

Shaft System ◽

Novel Approach ◽

Design Changes ◽

Prediction Techniques ◽

Shaft Deformation

The enlargement of ships has increased the relative hull deformation owing to draft changes. Moreover, design changes such as an increased propeller diameter and pitch changes have occurred to compensate for the reduction in the engine revolution and consequent ship speed. In terms of propulsion shaft alignment, as the load of the stern tube support bearing increases, an uneven load distribution occurs between the shaft support bearings, leading to stern accidents. To prevent such accidents and to ensure shaft system stability, a shaft system design technique is required in which the shaft deformation resulting from the hull deformation is considered. Based on the measurement data of a medium-sized oil/chemical tanker, this study presents a novel approach to predicting the shaft deformation following stern hull deformation through inverse analysis using deep reinforcement learning, as opposed to traditional prediction techniques. The main bearing reaction force, which was difficult to reflect in previous studies, was predicted with high accuracy by comparing it with the measured value, and reasonable shaft deformation could be derived according to the hull deformation. The deep reinforcement learning technique in this study is expected to be expandable for predicting the dynamic behavior of the shaft of an operating vessel.

Download Full-text