stationary policy Latest Research Papers

The concept of cloud computing was created to better preserve user privacy and data storage security. However, the resources allocated for processing this data must be optimally allocated. The problem of optimal resource management in the loud computing environment is described in many scientific publications. To solve the problems of optimality of the distribution of resources of systems, you can use the construction and analysis of QS. We conduct an analysis of two-buffer queuing system with cross-type service and additional penalties, based on the literature reviewed in the article. This allows us to assess how suitable the model presented in the article is for application to cloud computing. For a given system different options for selecting applications from queues are possible, queue numbers, therefore, the intensities of transitions between the states of the system will change. For this, the system has a choice policy that allows the system to decide how to behave depending on its state. There are four components of such selection management models, which is a stationary policy for selecting a queue number to service a ticket on a vacated virtual machine each time immediately before service ends. A simulation model was built for numerical analysis. The results obtained indicate that requests are practically not delayed in the queue of the presented QS, and therefore the policy for a given model can be considered optimal. Although Poisson flow is the simplest for simulation, it is quite acceptable for performance evaluation. In the future, it is planned to conduct several more experiments for different values of the intensity of requests and various types of incoming flows.

Download Full-text

Wide-Sense Stationary Policy Optimization with Bellman Residual on Video Games

2021 IEEE International Conference on Multimedia and Expo (ICME) ◽

10.1109/icme51207.2021.9428293 ◽

2021 ◽

Author(s):

Chen Gong ◽

Qiang He ◽

Yunpeng Bai ◽

Xinwen Hou ◽

Guoliang Fan ◽

...

Keyword(s):

Video Games ◽

Wide Sense ◽

Stationary Policy ◽

Policy Optimization

Download Full-text

Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates

Journal of Applied Probability ◽

10.1017/jpr.2020.105 ◽

2021 ◽

Vol 58 (2) ◽

pp. 523-550

Author(s):

Xin Guo ◽

Yonghui Huang

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Finite Horizon ◽

Dynamic Programming Principle ◽

Iteration Algorithm ◽

Stationary Policy ◽

Risk Sensitive ◽

Finite State ◽

Markov Decision

AbstractThis paper considers risk-sensitive average optimization for denumerable continuous-time Markov decision processes (CTMDPs), in which the transition and cost rates are allowed to be unbounded, and the policies can be randomized history dependent. We first derive the multiplicative dynamic programming principle and some new facts for risk-sensitive finite-horizon CTMDPs. Then, we establish the existence and uniqueness of a solution to the risk-sensitive average optimality equation (RS-AOE) through the results for risk-sensitive finite-horizon CTMDPs developed here, and also prove the existence of an optimal stationary policy via the RS-AOE. Furthermore, for the case of finite actions available at each state, we construct a sequence of models of finite-state CTMDPs with optimal stationary policies which can be obtained by a policy iteration algorithm in a finite number of iterations, and prove that an average optimal policy for the case of infinitely countable states can be approximated by those of the finite-state models. Finally, we illustrate the conditions and the iteration algorithm with an example.

Download Full-text

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

E3S Web of Conferences ◽

10.1051/e3sconf/202122901047 ◽

2021 ◽

Vol 229 ◽

pp. 01047

Author(s):

Abdellatif Semmouri ◽

Mostafa Jourhmane ◽

Bahaa Eddine Elbaghazaoui

Keyword(s):

Markov Decision Processes ◽

Mobile Networks ◽

Decision Processes ◽

Stationary Policy ◽

Decomposition Approach ◽

Finite State ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Decision Epoch ◽

Discounted Criterion

In this paper we consider a constrained optimization of discrete time Markov Decision Processes (MDPs) with finite state and action spaces, which accumulate both a reward and costs at each decision epoch. We will study the problem of finding a policy that maximizes the expected total discounted reward subject to the constraints that the expected total discounted costs are not greater than given values. Thus, we will investigate the decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy. The discounted criterion has many applications in several areas such that the Forest Management, the Management of Energy Consumption, the finance, the Communication System (Mobile Networks) and the artificial intelligence.

Download Full-text

Stochastic Dynamic Programming with Non-linear Discounting

Applied Mathematics & Optimization ◽

10.1007/s00245-020-09731-x ◽

2020 ◽

Author(s):

Nicole Bäuerle ◽

Anna Jaśkiewicz ◽

Andrzej S. Nowak

Keyword(s):

Dynamic Programming ◽

Programming Model ◽

Stochastic Dynamic ◽

Discount Function ◽

Stationary Policy ◽

One Stage ◽

Discounted Utility ◽

Non Linear ◽

Optimal Stationary Policy ◽

The One

AbstractIn this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaśkiewicz et al. (Math Oper Res 38:108–121, 2013), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: (a) when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and (b) when the one-stage utility is unbounded from below.

Download Full-text

Learning to Model Opponent Learning (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7157 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13771-13772

Author(s):

Ian Davies ◽

Zheng Tian ◽

Jun Wang

Keyword(s):

Decision Making ◽

Value Function ◽

Search Algorithms ◽

Stationary Policy ◽

Policy Search ◽

Environment Policy ◽

Novel Approach ◽

Multi Agent ◽

Adaptation And Learning ◽

Opponent Modelling

Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.

Download Full-text

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Applied Probability ◽

10.1239/jap/1437658607 ◽

2015 ◽

Vol 52 (2) ◽

pp. 419-440

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-De-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Point Of View ◽

Average Reward ◽

Stationary Policy ◽

Optimality Equation ◽

Markov Decision ◽

Average Reward Criterion ◽

Compact Action Sets ◽

Path Point ◽

Reward Criterion

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Download Full-text

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Applied Probability ◽

10.1017/s0021900200012559 ◽

2015 ◽

Vol 52 (02) ◽

pp. 419-440 ◽

Cited By ~ 1

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-De-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Point Of View ◽

Average Reward ◽

Stationary Policy ◽

Optimality Equation ◽

Markov Decision ◽

Average Reward Criterion ◽

Compact Action Sets ◽

Path Point ◽

Reward Criterion

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Download Full-text

Climate Change, Extreme Events, and Hurricane Sandy: From Non-Stationary Climate to Non-Stationary Policy

Journal of Extreme Events ◽

10.1142/s2345737614500080 ◽

2014 ◽

Vol 01 (01) ◽

pp. 1450008 ◽

Cited By ~ 8

Author(s):

William Solecki ◽

Cynthia Rosenzweig

Keyword(s):

Climate Change ◽

New York ◽

Extreme Events ◽

Climate Adaptation ◽

Hurricane Sandy ◽

Generation Process ◽

Stationary Policy ◽

Building Stock ◽

Policy Window ◽

Post Disaster

This paper illustrates and examines the development of a flexible climate adaptation approach and non-stationary climate policy in New York City in the post-Hurricane Sandy context. Extreme events, such as Hurricane Sandy, are presented as learning opportunities and create a policy window for outside-of-the-box solutions and experimentation. The research investigates the institutionalization of laws, standards, and codes that are required to reflect an increasingly dynamic set of local environmental stresses associated with climate change. The City of New York responded to Hurricane Sandy with a set of targeted adjustments to the existing infrastructure and building stock in a way that both makes it more resistant (i.e., strengthened) and resilient (i.e., responsive to stress) in the face of future extreme events. Post-Sandy New York experiences show that the conditions for a post-disaster flexible adaptation response exist, and evidence shows that the beginnings of a non-stationary policy generation process have been put into place. More broadly, post-disaster policy processes have been configured in New York to enable continuous co-production of knowledge by scientists and the community of decision-makers and stakeholders.

Download Full-text

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Advances in Applied Probability ◽

10.1239/aap/1396360106 ◽

2014 ◽

Vol 46 (1) ◽

pp. 121-138 ◽

Cited By ~ 2

Author(s):

Ulrich Rieder ◽

Marc Wittlinger

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Stationary Policy ◽

General Utility ◽

Investment Problem ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Running Maximum ◽

The Value Function

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.

Download Full-text

stationary policy
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

To analysis of a two-buffer queuing system with cross-type service and additional penalties

Wide-Sense Stationary Policy Optimization with Bellman Residual on Video Games

Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

Stochastic Dynamic Programming with Non-linear Discounting

Learning to Model Opponent Learning (Student Abstract)

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Climate Change, Extreme Events, and Hurricane Sandy: From Non-Stationary Climate to Non-Stationary Policy

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Export Citation Format

stationary policyRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

To analysis of a two-buffer queuing system with cross-type service and additional penalties

Wide-Sense Stationary Policy Optimization with Bellman Residual on Video Games

Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

Stochastic Dynamic Programming with Non-linear Discounting

Learning to Model Opponent Learning (Student Abstract)

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Climate Change, Extreme Events, and Hurricane Sandy: From Non-Stationary Climate to Non-Stationary Policy

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

stationary policy
Recently Published Documents