scholarly journals A Q-learning Approach to a Consumption-Investment Problem

2021 ◽  
Vol 10 (2) ◽  
pp. 110
Author(s):  
Ruy Lopez-Rios

The paper deals with a discrete-time consumption investment problem with an infinite horizon. This problem is formulated as a Markov decision process with an expected total discounted utility as an objective function. This paper aims to presents a procedure to approximate the solution via machine learning, specifically, a Q-learning technique. The numerical results of the problem are provided.

2021 ◽  
Vol 10 (2) ◽  
pp. 109
Author(s):  
Ruy Lopez-Rios

The paper deals with a discrete-time consumption investment problem with an infinite horizon. This problem is formulated as a Markov decision process with an expected total discounted utility as an objective function. This paper aims to presents a procedure to approximate the solution via machine learning, specifically, a Q-learning technique. The numerical results of the problem are provided.


2017 ◽  
Vol 7 (1.5) ◽  
pp. 274
Author(s):  
D. Ganesha ◽  
Vijayakumar Maragal Venkatamuni

This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm.  State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards.  Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified   SARSA learning algorithm can   be more suitable in EMCAP architecture.  The experiments are conducted the modified   SARSA Learning system gets   more rewards compare to existing  SARSA algorithm.


2020 ◽  
Vol 17 (4A) ◽  
pp. 677-682
Author(s):  
Adnan Shaout ◽  
Brennan Crispin

This paper presents a method using neural networks and Markov Decision Process (MDP) to identify the source and class of video streaming services. The paper presents the design and implementation of an end-to-end pipeline for training and classifying a machine learning system that can take in packets collected over a network interface and classify the data stream as belonging to one of five streaming video services: You Tube, You Tube TV, Netflix, Amazon Prime, or HBO


Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


Author(s):  
Marek Laskowski

Science is on the verge of practical agent based modeling decision support systems capable of machine learning for healthcare policy decision support. The details of integrating an agent based model of a hospital emergency department with a genetic programming machine learning system are presented in this paper. A novel GP heuristic or extension is introduced to better represent the Markov Decision Process that underlies agent decision making in an unknown environment. The capabilities of the resulting prototype for automated hypothesis generation within the context of healthcare policy decision support are demonstrated by automatically generating patient flow and infection spread prevention policies. Finally, some observations are made regarding moving forward from the prototype stage.


2021 ◽  
Author(s):  
Xiaocheng Li ◽  
Huaiyang Zhong ◽  
Margaret L. Brandeau

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.


2014 ◽  
Vol 46 (01) ◽  
pp. 121-138 ◽  
Author(s):  
Ulrich Rieder ◽  
Marc Wittlinger

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.


Sign in / Sign up

Export Citation Format

Share Document