Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

2014 ◽  
Vol 7 (6) ◽  
pp. 967-980 ◽  
Author(s):  
Sholeh Yasini ◽  
Mohammad Bagher Naghibi Sitani ◽  
Ali Kirampor
ACTA IMEKO ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 28
Author(s):  
Gabor Paczolay ◽  
Istvan Harmati

<p class="Abstract">Reinforcement learning is currently one of the most researched fields of artificial intelligence. New algorithms are being developed that use neural networks to compute the selected action, especially for deep reinforcement learning. One subcategory of reinforcement learning is multi-agent reinforcement learning, in which multiple agents are present in the world. As it involves the simulation of an environment, it can be applied to robotics as well. In our paper, we use our modified version of the advantage actor–critic (A2C) algorithm, which is suitable for multi-agent scenarios. We test this modified algorithm on our testbed, a cooperative–competitive pursuit–evasion environment, and later we address the problem of collision avoidance.</p>


2021 ◽  
Vol 6 (1) ◽  
pp. 48-54
Author(s):  
Jezuina Koroveshi ◽  
Ana Ktona

Target tracking is a process that may find applications in different domains such as video surveillance, robot navigation and human computer interaction. In this work we have considered the problem of tracking a moving object in a multi agent environment. The environment is a rectangular space bounded by walls. The first agent is the target and it moves randomly in the space. The second agent should follow the target, keeping as close as possible without crashing with it. It uses sensors to detect the position of the target. The sensor readings give the distance and the angle from the target. We use reinforcement learning to train the tracker to detect any change in the movement of the target and stay within a certain range from it. Reinforcement learning is a form of machine learning in which the agent learns by interacting with the environment. By doing so, for each action taken, the agent receives a reward from the environment, which is used to determine positive or negative behaviour. The goal of the agent is to maximise the total reward received during the interaction. This form of machine learning has applications in different areas, such as: game solving with the most known game being AlphaGO; robotics, for design of hard-to engineer behaviours; traffic light control, personalized recommendations, etc. The sensor readings may have continuous values, making a very large state space. We approximate the value function using neural networks and use different reward functions for learning the best policy.


Sign in / Sign up

Export Citation Format

Share Document