continuous reward Latest Research Papers

2021 ◽

Author(s):

Xiong Wang ◽

Riheng Jia

Keyword(s):

Contraction Mapping ◽

Mean Field ◽

Markov Analysis ◽

Discrete State ◽

Reward Function ◽

Continuous State ◽

State Evolution ◽

Continuous Reward ◽

Multi Agent ◽

The Mean

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

Download Full-text

One-sided solutions for optimal stopping problems with logconcave reward functions

Advances in Applied Probability ◽

10.1017/apr.2019.4 ◽

2019 ◽

Vol 51 (01) ◽

pp. 87-115

Author(s):

Yi-Shen Lin ◽

Yi-Ching Yao

Keyword(s):

Random Walks ◽

Optimal Stopping ◽

Lévy Processes ◽

Levy Processes ◽

Stopping Times ◽

Reward Function ◽

Continuous Reward ◽

Optimal Stopping Problems ◽

Reward Functions ◽

Optimal Stopping Times

AbstractIn the literature on optimal stopping, the problem of maximizing the expected discounted reward over all stopping times has been explicitly solved for some special reward functions (including (x+)ν, (ex − K)+, (K − e− x)+, x ∈ ℝ, ν ∈ (0, ∞), and K > 0) under general random walks in discrete time and Lévy processes in continuous time (subject to mild integrability conditions). All such reward functions are continuous, increasing, and logconcave while the corresponding optimal stopping times are of threshold type (i.e. the solutions are one-sided). In this paper we show that all optimal stopping problems with increasing, logconcave, and right-continuous reward functions admit one-sided solutions for general random walks and Lévy processes, thereby generalizing the aforementioned results. We also investigate in detail the principle of smooth fit for Lévy processes when the reward function is increasing and logconcave.

Download Full-text

Acquiring Classifiers for Bipolarized Reward by XCS in a Continuous Reward Environment

SICE Journal of Control Measurement and System Integration ◽

10.9746/jcmsi.12.124 ◽

2019 ◽

Vol 12 (3) ◽

pp. 124-132 ◽

Cited By ~ 1

Author(s):

Takato TATSUMI ◽

Keiki TAKADAMA

Keyword(s):

Continuous Reward

Download Full-text

A Case Study on Air Combat Decision Using Approximated Dynamic Programming

Mathematical Problems in Engineering ◽

10.1155/2014/183401 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 8

Author(s):

Yaofei Ma ◽

Xiaole Ma ◽

Xiao Song

Keyword(s):

Dynamic Programming ◽

State Space ◽

High Performance ◽

High Efficiency ◽

Policy Improvement ◽

Space Problem ◽

Reward Function ◽

Continuous State ◽

Air Combat ◽

Continuous Reward

As a continuous state space problem, air combat is difficult to be resolved by traditional dynamic programming (DP) with discretized state space. The approximated dynamic programming (ADP) approach is studied in this paper to build a high performance decision model for air combat in 1 versus 1 scenario, in which the iterative process for policy improvement is replaced by mass sampling from history trajectories and utility function approximating, leading to high efficiency on policy improvement eventually. A continuous reward function is also constructed to better guide the plane to find its way to “winner” state from any initial situation. According to our experiments, the plane is more offensive when following policy derived from ADP approach other than the baseline Min-Max policy, in which the “time to win” is reduced greatly but the cumulated probability of being killed by enemy is higher. The reason is analyzed in this paper.

Download Full-text

Persistent Behaviour in Extinction after Partial Deprivation in Training

Quarterly Journal of Experimental Psychology ◽

10.1080/14640747608400589 ◽

1976 ◽

Vol 28 (4) ◽

pp. 633-642 ◽

Cited By ~ 1

Author(s):

R. G. M. Morris ◽

D. F. Einon ◽

M. J. Morgan

Keyword(s):

Partial Reinforcement Effect ◽

Partial Reinforcement ◽

Partial Reward ◽

The Other ◽

Deprivation Schedule ◽

Reinforcement Effect ◽

Continuous Reward

Four groups of rats were trained to run an alleyway with one trial per day. Two groups were always deprived when trained while the other two received a partial deprivation schedule. One group of each pair received a continuous reward in the goal box while the other received partial reward. A partial reinforcement effect was found during extinction. The partially deprived groups also showed persistence in extinction. This result extends parallels between the effects of satiation and nonreward upon behaviour.

Download Full-text

Effect of Prior Rewarded Goal-Box Placements on Small-Trials Partial-Reinforcement Effect

Psychological Reports ◽

10.2466/pr0.1975.36.2.659 ◽

1975 ◽

Vol 36 (2) ◽

pp. 659-669

Author(s):

Jeffrey A. Goldman

Keyword(s):

Partial Reinforcement Effect ◽

Partial Reinforcement ◽

Partial Reward ◽

Reinforcement Effect ◽

Acquisition Performance ◽

Schedule Condition ◽

Continuous Reward

The study was designed to test three hypotheses of the small-trials partial-reinforcement effect. Seventy-two hooded rats received either continuous reward or partial reward for 6 acquisition trials in a runway. Prior to acquisition, an equal number of Ss in each schedule condition were given 0, 5, or 10 rewarded goal-box placements. A strong small-trials partial-reinforcement effect was evident. There was an effect of placements on both initial and terminal acquisition-performance, indicating that rg–sg was preconditioned to goal-box cues. However, this preconditioned rg–sg failed to influence persistence of responding during extinction. The extinction results are contrary to two frustration analyses but do not contradict an analysis of aftereffects.

Download Full-text

Primary and Secondary Motivational Properties of a Light Stimulus

Psychological Reports ◽

10.2466/pr0.1974.34.3.799 ◽

1974 ◽

Vol 34 (3) ◽

pp. 799-809 ◽

Cited By ~ 1

Author(s):

Charles L. Goodrick

Keyword(s):

Food Reward ◽

Light Stimulus ◽

Training Condition ◽

Reward Schedule ◽

Directional Effect ◽

Continuous Reward ◽

Bar Test ◽

Increasing Function ◽

Secondary Reinforcing

Rats were trained on a continuous reward schedule (CRF) in a 2-bar test box with or without light contingency associated with food reward. During extinction trials with the light contingency presented on CRF, the light contingency was shown to facilitate extinction on the basis of response totals (energizing effect) and percentage of responses on the reward bar (directional effect). For these extinction trials, no evidence was obtained for secondary reinforcing (Sr) properties of the light contingency. However, a second experiment found a strong Sr energizing effect when the light contingency was presented on a FR-5 or FR-10 schedule during extinction trials. These energizing effects were also obtained for rats trained without the light contingency, but to a lesser degree than for rats trained with the light contingency. Presentation of the light contingency on a FR schedule during extinction trials also resulted in a strong directional effect for all groups, regardless of the training condition. A moderately durable energizing effect of Sr was obtained which was an increasing function of partial periodic Sr during extinction trials.

Download Full-text

Effects of Partial Reward on Lever Pulling and Activity Level of Retarded Youth

Perceptual and Motor Skills ◽

10.2466/pms.1973.37.2.669 ◽

1973 ◽

Vol 37 (2) ◽

pp. 669-670

Author(s):

Charles S. Hayes ◽

Cathie Siders ◽

Bill Snider

Keyword(s):

Frustrative Nonreward ◽

Free Field ◽

Partial Reward ◽

Activity Level ◽

Field Activity ◽

Continuous Reward

30 youngsters from classes for the moderately retarded received partial or continuous reward for 20 lever-pulling trials. Contrary to predictions based on frustrative nonreward theory, neither speed of lever pulls nor free field activity following each trial differentiated reward groups.

Download Full-text