scholarly journals An Autonomous Emotional Virtual Character: An Approach with Deep and Goal-Parameterized Reinforcement Learning

2020 ◽  
Vol 11 (1) ◽  
pp. 27-44
Author(s):  
Gilzamir Ferreira Gomes ◽  
Creto Augusto Vidal ◽  
Joaquim Bento Cavalcante Neto ◽  
Yuri Lenon Barbosa Nogueira

We have developed an autonomous virtual character guided by emotions. The agent is a virtual character who lives in a three-dimensional maze world. We found that emotion drivers can induce the behavior of a trained agent. Our approach is a case of goal parameterized reinforcement learning. Thus, we create conditioning between emotion drivers and a set of goals that determine the behavioral profile of a virtual character. We train agents who can randomly assume these goals while trying to maximize a reward function based on intrinsic and extrinsic motivations. A mapping between motivation and emotion was carried out. So, the agent learned a behavior profile as a training goal. The developed approach was integrated with the Advantage Actor-Critic (A3C) algorithm. Experiments showed that this approach produces behaviors consistent with the objectives given to agents, and has potential for the development of believable virtual characters.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Luo Zhe ◽  
Li Xinsan ◽  
Wang Lixin ◽  
Shen Qiang

In order to improve the autonomy of gliding guidance for complex flight missions, this paper proposes a multiconstrained intelligent gliding guidance strategy based on optimal guidance and reinforcement learning (RL). Three-dimensional optimal guidance is introduced to meet the terminal latitude, longitude, altitude, and flight-path-angle constraints. A velocity control strategy through lateral sinusoidal maneuver is proposed, and an analytical terminal velocity prediction method considering maneuvering flight is studied. Aiming at the problem that the maneuvering amplitude in velocity control cannot be determined offline, an intelligent parameter adjustment method based on RL is studied. This method considers parameter determination as a Markov Decision Process (MDP) and designs a state space via terminal speed and an action space with maneuvering amplitude. In addition, it constructs a reward function that integrates terminal velocity error and gliding guidance tasks and uses Q-Learning to achieve the online intelligent adjustment of maneuvering amplitude. The simulation results show that the intelligent gliding guidance method can meet various terminal constraints with high accuracy and can improve the autonomous decision-making ability under complex tasks effectively.


2019 ◽  
Vol 7 (12) ◽  
pp. 443 ◽  
Author(s):  
Yushan Sun ◽  
Chenming Zhang ◽  
Guocheng Zhang ◽  
Hao Xu ◽  
Xiangrui Ran

In this paper, the three-dimensional (3D) path tracking control of an autonomous underwater vehicle (AUV) under the action of sea currents was researched. A novel reward function was proposed to improve learning ability and a disturbance observer was developed to observe the disturbance caused by currents. Based on existing models, the dynamic and kinematic models of the AUV were established. Deep Deterministic Policy Gradient, a deep reinforcement learning, was employed for designing the path tracking controller. Compared with the backstepping sliding mode controller, the controller proposed in this article showed excellent performance, at least in the particular study developed in this article. The improved reward function and the disturbance observer were also found to work well with improving path tracking performance.


2021 ◽  
Author(s):  
Stav Belogolovsky ◽  
Philip Korsunsky ◽  
Shie Mannor ◽  
Chen Tessler ◽  
Tom Zahavy

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.


2021 ◽  
Author(s):  
Amarildo Likmeta ◽  
Alberto Maria Metelli ◽  
Giorgia Ramponi ◽  
Andrea Tirinzoni ◽  
Matteo Giuliani ◽  
...  

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 587
Author(s):  
Joao Pedro de Carvalho ◽  
Roussos Dimitrakopoulos

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1292
Author(s):  
Neziha Akalin ◽  
Amy Loutfi

This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.


Author(s):  
Feng Pan ◽  
Hong Bao

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.


Author(s):  
Qingyuan Zheng ◽  
Duo Wang ◽  
Zhang Chen ◽  
Yiyong Sun ◽  
Bin Liang

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.


Sign in / Sign up

Export Citation Format

Share Document