Exploitation-Oriented Learning XoL

Author(s):  
Kazuteru Miyazaki

Exploitation-oriented Learning XoL is a new framework of reinforcement learning. XoL aims to learn a rational policy whose expected reward per an action is larger than zero, and does not require a sophisticated design of the value of a reward signal. In this chapter, as examples of learning systems that belongs in XoL, we introduce the rationality theorem of profit Sharing (PS), the rationality theorem of reward sharing in multi-agent PS, and PS-r*. XoL has several features. (1) Though traditional RL systems require appropriate reward and penalty values, XoL only requires an order of importance among them. (2) XoL can learn more quickly since it traces successful experiences very strongly. (3) XoL may be unsuitable for pursuing an optimal policy. The optimal policy can be acquired by the multi-start method that needs to reset all memories to get a better policy. (4) XoL is effective on the classes beyond MDPs, since it is a Bellman-free method that does not depend on DP. We show several numerical examples to confirm these features.

2021 ◽  
Vol 11 (1) ◽  
pp. 6637-6644
Author(s):  
H. El Fazazi ◽  
M. Elgarej ◽  
M. Qbadou ◽  
K. Mansouri

Adaptive e-learning systems are created to facilitate the learning process. These systems are able to suggest the student the most suitable pedagogical strategy and to extract the information and characteristics of the learners. A multi-agent system is a collection of organized and independent agents that communicate with each other to resolve a problem or complete a well-defined objective. These agents are always in communication and they can be homogeneous or heterogeneous and may or may not have common objectives. The application of the multi-agent approach in adaptive e-learning systems can enhance the learning process quality by customizing the contents to students’ needs. The agents in these systems collaborate to provide a personalized learning experience. In this paper, a design of an adaptative e-learning system based on a multi-agent approach and reinforcement learning is presented. The main objective of this system is the recommendation to the students of a learning path that meets their characteristics and preferences using the Q-learning algorithm. The proposed system is focused on three principal characteristics, the learning style according to the Felder-Silverman learning style model, the knowledge level, and the student's possible disabilities. Three types of disabilities were taken into account, namely hearing impairments, visual impairments, and dyslexia. The system will be able to provide the students with a sequence of learning objects that matches their profiles for a personalized learning experience.


Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1133
Author(s):  
Shanzhi Gu ◽  
Mingyang Geng ◽  
Long Lan

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 45812-45821
Author(s):  
Xiaoyan Wang ◽  
Jun Peng ◽  
Shuqiu Li ◽  
Bing Li

Author(s):  
Wataru Uemura ◽  

In reinforcement learning systems based on trial-and error, the agent, that is the subject or the system that perceives its environment and takes actions which maximize its chances of success, is rewarded when it attains the target level of learning of the learning exercise. In Profit Sharing, the reinforcement learning process is pursued for the accumulation of such rewards. In order to continue the process of reward accumulation, the agent insists upon the repetition of the particular actions that are being learned and avoids selecting other actions, making the agent less adaptable to changes in the environment. In view of the above, this paper attempts to propose the introduction of the concept of infatuation to eliminate the reluctance of the agent to adapt to new environments. If the agent is a living being, when a single particular reinforcement learning process is repeated, the stimulus the agent perceives in each of the processes gradually loses its intensity due to familiarization. However, if the agent encounters a set of rules that are different from those of the particular repeated learning process, then the agent reverts to the previous particular learning process, and the stimulus the agent receives after the said reversion recovers its intensity. The intention here is to apply the concept of assimilation infatuation to Profit Sharing, and to confirm its effects through experiments.


Sign in / Sign up

Export Citation Format

Share Document