Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.

Download Full-text

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Advances in Applied Probability ◽

10.1239/aap/1396360106 ◽

2014 ◽

Vol 46 (1) ◽

pp. 121-138 ◽

Cited By ~ 2

Author(s):

Ulrich Rieder ◽

Marc Wittlinger

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Stationary Policy ◽

General Utility ◽

Investment Problem ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Running Maximum ◽

The Value Function

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.

Download Full-text

Optimal Occupation in the Complete Graph

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800002989 ◽

1993 ◽

Vol 7 (3) ◽

pp. 369-385 ◽

Cited By ~ 1

Author(s):

Kyle Siegrist

Keyword(s):

Markov Decision Process ◽

Complete Graph ◽

Decision Process ◽

Value Function ◽

Comparison Result ◽

State Action ◽

Optimal Policies ◽

Markov Decision ◽

The Cost ◽

The Value Function

We consider N sites (N ≤ ∞), each of which may be either occupied or unoccupied. Time is discrete, and at each time unit a set of occupied sites may attempt to capture a previously unoccupied site. The attempt will be successful with a probability that depends on the number of sites making the attempt, in which case the new site will also be occupied. A benefit is gained when new sites are occupied, but capture attempts are costly. The problem of optimal occupation is formulated as a Markov decision process in which the admissible actions are occupation strategies and the cost is a function of the strategy and the number of occupied sites. A partial order on the state-action pairs is used to obtain a comparison result for stationary policies and qualitative results concerning monotonicity of the value function for the n-stage problem (n ≤ ∞). The optimal policies are partially characterized when the cost depends on the action only through the total number of occupation attempts made.

Download Full-text

An Offloading Algorithm based on Markov Decision Process in Mobile Edge Computing System

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.15 ◽

2022 ◽

Vol 16 ◽

pp. 115-121

Author(s):

Bingxin Yao ◽

Bin Wu ◽

Siyun Wu ◽

Yin Ji ◽

Danggui Chen ◽

...

Keyword(s):

Energy Consumption ◽

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Wireless Channel ◽

Edge Computing ◽

Iteration Algorithm ◽

Mobile Edge Computing ◽

Markov Decision ◽

The Value Function

In this paper, an offloading algorithm based on Markov Decision Process (MDP) is proposed to solve the multi-objective offloading decision problem in Mobile Edge Computing (MEC) system. The feature of the algorithm is that MDP is used to make offloading decision. The number of tasks in the task queue, the number of accessible edge clouds and Signal-Noise-Ratio (SNR) of the wireless channel are taken into account in the state space of the MDP model. The offloading delay and energy consumption are considered to define the value function of the MDP model, i.e. the objective function. To maximize the value function, Value Iteration Algorithm is used to obtain the optimal offloading policy. According to the policy, tasks of mobile terminals (MTs) are offloaded to the edge cloud or central cloud, or executed locally. The simulation results show that the proposed algorithm can effectively reduce the offloading delay and energy consumption.

Download Full-text

An IoT based Smart Irrigation Management System using Reinforcement Learning modeled through a Markov Decision Process

10.1109/ds-rt52167.2021.9576130 ◽

2021 ◽

Author(s):

Luis Miguel Samaniego Campoverde ◽

Mauro Tropea ◽

Floriano De Rango

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Management System ◽

Irrigation Management ◽

Markov Decision

Download Full-text

A Multi-Step Reinforcement Learning Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3611 ◽

2010 ◽

Vol 44-47 ◽

pp. 3611-3615 ◽

Cited By ~ 1

Author(s):

Zhi Cong Zhang ◽

Kai Shun Hu ◽

Hui Yu Huang ◽

Shuai Li ◽

Shao Yong Zhao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Learning Algorithm ◽

Machine Learning Method ◽

Learning Method ◽

K Value ◽

Markov Decision ◽

Action Value

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.

Download Full-text

Cooperative retransmissions using Markov decision process with reinforcement learning

2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications ◽

10.1109/pimrc.2009.5450098 ◽

2009 ◽

Cited By ~ 1

Author(s):

Ghasem Naddafzadeh Shirazi ◽

Peng-Yong Kong ◽

Chen-Khong Tham

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Markov Decision

Download Full-text

Continuous-time Markov decision process with average reward: Using reinforcement learning method

2015 34th Chinese Control Conference (CCC) ◽

10.1109/chicc.2015.7260117 ◽

2015 ◽

Author(s):

Shengde Jia ◽

Lincheng Shen ◽

Hongtao Xue

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Continuous Time ◽

Decision Process ◽

Learning Method ◽

Average Reward ◽

Markov Decision

Download Full-text

Universal Reinforcement Learning Algorithms: Survey and Experiments

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/194 ◽

2017 ◽

Author(s):

John Aslanides ◽

Jan Leike ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Open Source ◽

Markov Decision Process ◽

Decision Process ◽

Empirical Investigation ◽

State Of The Art ◽

Learning Algorithms ◽

Markov Decision ◽

Reference Implementation ◽

Partially Observable

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

Download Full-text