Heterogeneous Flow Scheduling using Deep Reinforcement Learning in Partially Observable NFV Environment

Author(s):  
Chun-Jen Lin ◽  
Yan Luo ◽  
Liang-min Wang
2021 ◽  
Author(s):  
Wenjie Shang ◽  
Qingyang Li ◽  
Zhiwei Qin ◽  
Yang Yu ◽  
Yiping Meng ◽  
...  

Author(s):  
Jan Leike ◽  
Tor Lattimore ◽  
Laurent Orseau ◽  
Marcus Hutter

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 307
Author(s):  
Luca Pasqualini ◽  
Maurizio Parton

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.


2020 ◽  
Vol 34 (02) ◽  
pp. 2128-2135
Author(s):  
Yang Liu ◽  
Qi Liu ◽  
Hongke Zhao ◽  
Zhen Pan ◽  
Chuanren Liu

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e.g., machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.


Author(s):  
Yu. V. Dubenko

This paper is devoted to the problem of collective artificial intelligence in solving problems by intelligent agents in external environments. The environments may be: fully or partially observable, deterministic or stochastic, static or dynamic, discrete or continuous. The paper identifies problems of collective interaction of intelligent agents when they solve a class of tasks, which need to coordinate actions of agent group, e. g. task of exploring the territory of a complex infrastructure facility. It is revealed that the problem of reinforcement training in multi-agent systems is poorly presented in the press, especially in Russian-language publications. The article analyzes reinforcement learning, describes hierarchical reinforcement learning, presents basic methods to implement reinforcement learning. The concept of macro-action by agents integrated in groups is introduced. The main problems of intelligent agents collective interaction for problem solving (i. e. calculation of individual rewards for each agent; agent coordination issues; application of macro actions by agents integrated into groups; exchange of experience generated by various agents as part of solving a collective problem) are identified. The model of multi-agent reinforcement learning is described in details. The article describes problems of this approach building on existing solutions. Basic problems of multi-agent reinforcement learning are formulated in conclusion.


Author(s):  
John Aslanides ◽  
Jan Leike ◽  
Marcus Hutter

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.


2011 ◽  
Vol 40 ◽  
pp. 95-142 ◽  
Author(s):  
J. Veness ◽  
K.S. Ng ◽  
M. Hutter ◽  
W. Uther ◽  
D. Silver

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.


Sign in / Sign up

Export Citation Format

Share Document