Deep Reinforcement Learning for Optimization

Deep reinforcement learning (DRL) has transformed the field of artificial intelligence (AI) especially after the success of Google DeepMind. This branch of machine learning epitomizes a step toward building autonomous systems by understanding of the visual world. Deep reinforcement learning (RL) is currently applied to different sorts of problems that were previously obstinate. In this chapter, at first, the authors started with an introduction of the general field of RL and Markov decision process (MDP). Then, they clarified the common DRL framework and the necessary components RL settings. Moreover, they analyzed the stochastic gradient descent (SGD)-based optimizers such as ADAM and a non-specific multi-policy selection mechanism in a multi-objective Markov decision process. In this chapter, the authors also included the comparison for different Deep Q networks. In conclusion, they describe several challenges and trends in research within the deep reinforcement learning field.

Download Full-text

An IoT based Smart Irrigation Management System using Reinforcement Learning modeled through a Markov Decision Process

10.1109/ds-rt52167.2021.9576130 ◽

2021 ◽

Author(s):

Luis Miguel Samaniego Campoverde ◽

Mauro Tropea ◽

Floriano De Rango

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Management System ◽

Irrigation Management ◽

Markov Decision

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Implementation of modified SARSA learning technique in EMCAP

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.5.9161 ◽

2017 ◽

Vol 7 (1.5) ◽

pp. 274

Author(s):

D. Ganesha ◽

Vijayakumar Maragal Venkatamuni

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Decision Process ◽

Learning Algorithm ◽

Research Work ◽

Learning System ◽

State Action ◽

Learning Technique ◽

Markov Decision ◽

Experiment Analysis

This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm. State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards. Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified SARSA learning algorithm can be more suitable in EMCAP architecture. The experiments are conducted the modified SARSA Learning system gets more rewards compare to existing SARSA algorithm.

Download Full-text

A Multi-Step Reinforcement Learning Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3611 ◽

2010 ◽

Vol 44-47 ◽

pp. 3611-3615 ◽

Cited By ~ 1

Author(s):

Zhi Cong Zhang ◽

Kai Shun Hu ◽

Hui Yu Huang ◽

Shuai Li ◽

Shao Yong Zhao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Learning Algorithm ◽

Machine Learning Method ◽

Learning Method ◽

K Value ◽

Markov Decision ◽

Action Value

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.

Download Full-text

Cooperative retransmissions using Markov decision process with reinforcement learning

2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications ◽

10.1109/pimrc.2009.5450098 ◽

2009 ◽

Cited By ~ 1

Author(s):

Ghasem Naddafzadeh Shirazi ◽

Peng-Yong Kong ◽

Chen-Khong Tham

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Markov Decision

Download Full-text

Continuous-time Markov decision process with average reward: Using reinforcement learning method

2015 34th Chinese Control Conference (CCC) ◽

10.1109/chicc.2015.7260117 ◽

2015 ◽

Author(s):

Shengde Jia ◽

Lincheng Shen ◽

Hongtao Xue

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Continuous Time ◽

Decision Process ◽

Learning Method ◽

Average Reward ◽

Markov Decision

Download Full-text

Universal Reinforcement Learning Algorithms: Survey and Experiments

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/194 ◽

2017 ◽

Author(s):

John Aslanides ◽

Jan Leike ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Open Source ◽

Markov Decision Process ◽

Decision Process ◽

Empirical Investigation ◽

State Of The Art ◽

Learning Algorithms ◽

Markov Decision ◽

Reference Implementation ◽

Partially Observable

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

Download Full-text

Decision Making in Complex Multiagent Contexts: A Tale of Two Frameworks

AI Magazine ◽

10.1609/aimag.v33i4.2402 ◽

2012 ◽

Vol 33 (4) ◽

pp. 82 ◽

Cited By ~ 6

Author(s):

Prashant J. Doshi

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Partial Information ◽

Autonomous Systems ◽

Relevant Research ◽

Physical Context ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Decision making is a key feature of autonomous systems. It involves choosing optimally between different lines of action in various information contexts that range from perfectly knowing all aspects of the decision problem to having just partial knowledge about it. The physical context often includes other interacting autonomous systems, typically called agents. In this article, I focus on decision making in a multiagent context with partial information about the problem. Relevant research in this complex but realistic setting has converged around two complementary, general frameworks and also introduced myriad specializations on its way. I put the two frameworks, decentralized partially observable Markov decision process (Dec-POMDP) and the interactive partially observable Markov decision process (I-POMDP), in context and review the foundational algorithms for these frameworks, while briefly discussing the advances in their specializations. I conclude by examining the avenues that research pertaining to these frameworks is pursuing.

Download Full-text

Design Synthesis through a Markov Decision Process and Reinforcement Learning Framework

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4051598 ◽

2021 ◽

pp. 1-19

Author(s):

Maximilian Ororbia ◽

Gordon P. Warn

Keyword(s):

Reinforcement Learning ◽

Optimal Design ◽

Markov Decision Process ◽

Decision Process ◽

Plastic Material ◽

Cross Sectional ◽

Design Synthesis ◽

Learning Agent ◽

Markov Decision ◽

Elastic Plastic Material

Abstract This paper presents a framework that mathematically models optimal design synthesis as a Markov Decision Process that is solved with reinforcement learning. In this context, the states correspond to specific design configurations, the actions correspond to the available alterations modeled after generative design grammars, and the immediate rewards are constructed to be related to the improvement in the altered configuration's performance with respect to the design objective. Since in the context of optimal design synthesis the immediate rewards are in general not known at the onset of the process, reinforcement learning is employed to efficiently solve the MDP. The goal of the reinforcement learning agent is to maximize the cumulative rewards and hence synthesize the best performing or optimal design. The framework is demonstrated for the optimization of planar trusses with binary cross-sectional areas, and its utility is investigated with four numerical examples, each with a unique combination of domain, constraint, and external force(s) considering both linear-elastic and elastic-plastic material behaviors. The design solutions obtained with the framework are also compared with other methods in order to demonstrate its efficiency and accuracy.

Download Full-text