A Q-learning Approach to a Consumption-Investment Problem

The paper deals with a discrete-time consumption investment problem with an infinite horizon. This problem is formulated as a Markov decision process with an expected total discounted utility as an objective function. This paper aims to presents a procedure to approximate the solution via machine learning, specifically, a Q-learning technique. The numerical results of the problem are provided.

Download Full-text

Implementation of modified SARSA learning technique in EMCAP

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.5.9161 ◽

2017 ◽

Vol 7 (1.5) ◽

pp. 274

Author(s):

D. Ganesha ◽

Vijayakumar Maragal Venkatamuni

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Decision Process ◽

Learning Algorithm ◽

Research Work ◽

Learning System ◽

State Action ◽

Learning Technique ◽

Markov Decision ◽

Experiment Analysis

This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm. State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards. Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified SARSA learning algorithm can be more suitable in EMCAP architecture. The experiments are conducted the modified SARSA Learning system gets more rewards compare to existing SARSA algorithm.

Download Full-text

Streaming Video Classification Using Machine Learning

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/4a/13 ◽

2020 ◽

Vol 17 (4A) ◽

pp. 677-682

Author(s):

Adnan Shaout ◽

Brennan Crispin

Keyword(s):

Machine Learning ◽

Decision Process ◽

Data Stream ◽

Streaming Video ◽

Learning System ◽

Network Interface ◽

Video Classification ◽

Design And Implementation ◽

Markov Decision ◽

Streaming Services

This paper presents a method using neural networks and Markov Decision Process (MDP) to identify the source and class of video streaming services. The paper presents the design and implementation of an end-to-end pipeline for training and classifying a machine learning system that can take in packets collected over a network interface and classify the data stream as belonging to one of five streaming video services: You Tube, You Tube TV, Netflix, Amazon Prime, or HBO

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA

Advanced Computational Methods for Knowledge Engineering - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-38884-7_4 ◽

2016 ◽

pp. 43-55 ◽

Cited By ~ 2

Author(s):

Vinh Thanh Ho ◽

Hoai An Le Thi

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Infinite Horizon ◽

Dc Programming ◽

Markov Decision

Download Full-text

A Prototype Agent Based Model and Machine Learning Hybrid System for Healthcare Decision Support

Digital Advances in Medicine, E-Health, and Communication Technologies ◽

10.4018/978-1-4666-2794-9.ch013 ◽

2012 ◽

pp. 230-253 ◽

Cited By ~ 1

Author(s):

Marek Laskowski

Keyword(s):

Machine Learning ◽

Decision Support ◽

Decision Process ◽

Patient Flow ◽

Policy Decision ◽

Healthcare Policy ◽

Learning System ◽

Agent Based Model ◽

Agent Based ◽

Markov Decision

Science is on the verge of practical agent based modeling decision support systems capable of machine learning for healthcare policy decision support. The details of integrating an agent based model of a hospital emergency department with a genetic programming machine learning system are presented in this paper. A novel GP heuristic or extension is introduced to better represent the Markov Decision Process that underlies agent decision making in an unknown environment. The capabilities of the resulting prototype for automated hypothesis generation within the context of healthcare policy decision support are demonstrated by automatically generating patient flow and infection spread prevention policies. Finally, some observations are made regarding moving forward from the prototype stage.

Download Full-text

Solving infinite horizon discounted Markov decision process problems for a range of discount factors

Journal of Mathematical Analysis and Applications ◽

10.1016/0022-247x(89)90179-0 ◽

1989 ◽

Vol 141 (2) ◽

pp. 303-317 ◽

Cited By ~ 1

Author(s):

D.J White

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Infinite Horizon ◽

Discount Factors ◽

Markov Decision

Download Full-text

Spacecraft autonomy modeled via Markov decision process and associative rule-based machine learning

2017 IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace) ◽

10.1109/metroaerospace.2017.7999589 ◽

2017 ◽

Cited By ~ 9

Author(s):

Gianni D'Angelo ◽

Massimo Tipaldi ◽

Luigi Glielmo ◽

Salvatore Rampone

Keyword(s):

Machine Learning ◽

Markov Decision Process ◽

Decision Process ◽

Rule Based ◽

Markov Decision

Download Full-text

Quantile Markov Decision Processes

Operations Research ◽

10.1287/opre.2021.2123 ◽

2021 ◽

Author(s):

Xiaocheng Li ◽

Huaiyang Zhong ◽

Margaret L. Brandeau

Keyword(s):

Markov Decision Process ◽

Markov Decision Processes ◽

Decision Process ◽

Value At Risk ◽

Infinite Horizon ◽

Decision Processes ◽

Conditional Value At Risk ◽

Sequential Decision ◽

Optimal Drug ◽

Markov Decision

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.

Download Full-text

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Advances in Applied Probability ◽

10.1017/s0001867800006960 ◽

2014 ◽

Vol 46 (01) ◽

pp. 121-138 ◽

Cited By ~ 1

Author(s):

Ulrich Rieder ◽

Marc Wittlinger

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Stationary Policy ◽

General Utility ◽

Investment Problem ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Running Maximum ◽

The Value Function

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.

Download Full-text