scholarly journals An Extension of the Rational Policy Making algorithm to Continuous State Spaces

Author(s):  
Kazuteru Miyazaki ◽  
Hajime Kimura ◽  
Shigenobu Kobayashi
Author(s):  
Takuji Watanabe ◽  
◽  
Kazuteru Miyazaki ◽  
Hiroaki Kobayashi ◽  
◽  
...  

The penalty avoiding rational policy making algorithm (PARP) [1] previously improved to save memory and cope with uncertainty, i.e., IPARP [2], requires that states be discretized in real environments with continuous state spaces, using function approximation or some other method. Especially, in PARP, a method that discretizes state using a basis functions is known [3]. Because this creates a new basis function based on the current input and its next observation, however, an unsuitable basis function may be generated in some asynchronous multiagent environments. We therefore propose a uniform basis function and range extent of the basis function is estimated before learning. We show the effectiveness of our proposal using a soccer game task called “Keepaway.”


Author(s):  
Kazuteru Miyazaki ◽  
◽  
Shigenobu Kobayashi ◽  

Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input &#8211 serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm (PARP) [8], and PS-r* [9] are used. They are called PS-based methods. When applying reinforcement learning to actual problems, treatment of continuous-valued input is sometimes required. A method [10] based on RPM is proposed as a PS-based method corresponding to the continuous-valued input, but only rewards exist and penalties cannot be suitably handled. We studied the treatment of continuous-valued input suitable for a PS-based method in which the environment includes both rewards and penalties. Specifically, we propose having PARP correspond to continuous-valued input while simultaneously targeting the attainment of rewards and avoiding penalties. We applied our proposal to the pole-cart balancing problem and confirmed its validity.


Robotica ◽  
2010 ◽  
Vol 29 (5) ◽  
pp. 657-665 ◽  
Author(s):  
Yong Hu ◽  
Gangfeng Yan ◽  
Zhiyun Lin

SUMMARYThis paper investigates the stable-running problem of a planar underactuated biped robot, which has two springy telescopic legs and one actuated joint in the hip. After modeling the robot as a hybrid system with multiple continuous state spaces, a natural passive limit cycle, which preserves the system energy at touchdown, is found using the method of Poincaré shooting. It is then checked that the passive limit cycle is not stable. To stabilize the passive limit cycle, an event-based feedback control law is proposed, and also to enlarge the basin of attraction, an additive passivity-based control term is introduced only in the stance phase. The validity of our control strategies is illustrated by a series of numerical simulations.


2020 ◽  
Author(s):  
Hamidou Tembine

In this article, a class of mean-field-type games with discrete-continuous state spaces is considered. We establish Bellman systems which provide sufficiency conditions for mean-field-type equilibria in state-and-mean-field-type feedback form. We then derive unnormalized master adjoint systems (MASS). The methodology is shown to be flexible enough to capture multi-class interaction in epidemic propagation in which multiple authorities are risk-aware atomic decision-makers and individuals are risk-aware non-atomic decision-makers. Based on MASS, we present a data-driven modelling and analytics for mitigating Coronavirus Disease 2019 (COVID-19). The model integrates untested cases, age-structure, decision-making, gender, pre-existing health conditions, location, testing capacity, hospital capacity, mobility map on local areas, in-city, inter-cities, and international. It shown that the data-driven model can capture most of the reported data on COVID-19 on confirmed cases, deaths, recovered, number of testing and number of active cases in 66+ countries. The model also reports non-Gaussianity and non-exponential properties in 15+ countries.


2020 ◽  
Author(s):  
Daniel Zuckerman

The non-equilibrium principles and methods which are fundamental to understanding modern physics, biology, and chemistry can be intimidating. The key to learning them is working through simple, concrete examples. The beginner must do calculations by hand to truly grasp how the equations and ideas work together. Hence, seven sets of homework problems are given -- using two and three-state model chemical systems -- along with introductory discussion, complete solutions, and links to further discussion. The problems use simple math and graphics to introduce equilibrium, relaxation, non-equilibrium steady states and the associated timescales, using discrete and continuous state spaces as well as continuous and discrete time (``Markov state models''). The relationship between these descriptions is explained. The overall aim is to provide a set of easily approachable material that will significantly increase the student's confidence in approaching more advanced material.


Games ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 51
Author(s):  
Hamidou Tembine

In this article, a class of mean-field-type games with discrete-continuous state spaces is considered. We establish Bellman systems which provide sufficiency conditions for mean-field-type equilibria in state-and-mean-field-type feedback form. We then derive unnormalized master adjoint systems (MASS). The methodology is shown to be flexible enough to capture multi-class interaction in epidemic propagation in which multiple authorities are risk-aware atomic decision-makers and individuals are risk-aware non-atomic decision-makers. Based on MASS, we present a data-driven modelling and analytics for mitigating Coronavirus Disease 2019 (COVID-19). The model integrates untested cases, age-structure, decision-making, gender, pre-existing health conditions, location, testing capacity, hospital capacity, and a mobility map of local areas, including in-cities, inter-cities, and internationally. It is shown that the data-driven model can capture most of the reported data on COVID-19 on confirmed cases, deaths, recovered, number of testing and number of active cases in 66+ countries. The model also reports non-Gaussian and non-exponential properties in 15+ countries.


Sign in / Sign up

Export Citation Format

Share Document