On Thompson Sampling and Asymptotic Optimality

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/688 ◽

2017 ◽

Cited By ~ 3

Author(s):

Jan Leike ◽

Tor Lattimore ◽

Laurent Orseau ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Asymptotic Optimality ◽

Thompson Sampling ◽

Stochastic Environments ◽

Optimal Value ◽

Partially Observable ◽

General Stochastic

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

Download Full-text

A Strongly Asymptotically Optimal Agent in General Environments

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/302 ◽

2019 ◽

Author(s):

Michael K. Cohen ◽

Elliot Catt ◽

Marcus Hutter

Keyword(s):

Asymptotic Behavior ◽

Reinforcement Learning ◽

Asymptotic Optimality ◽

Asymptotically Optimal ◽

Learning Agents ◽

Optimal Value

Reinforcement Learning agents are expected to eventually perform well. Typically, this takes the form of a guarantee about the asymptotic behavior of an algorithm given some assumptions about the environment. We present an algorithm for a policy whose value approaches the optimal value with probability 1 in all computable probabilistic environments, provided the agent has a bounded horizon. This is known as strong asymptotic optimality, and it was previously unknown whether it was possible for a policy to be strongly asymptotically optimal in the class of all computable probabilistic environments. Our agent, Inquisitive Reinforcement Learner (Inq), is more likely to explore the more it expects an exploratory action to reduce its uncertainty about which environment it is in, hence the term inquisitive. Exploring inquisitively is a strategy that can be applied generally; for more manageable environment classes, inquisitiveness is tractable. We conducted experiments in "grid-worlds" to compare the Inquisitive Reinforcement Learner to other weakly asymptotically optimal agents.

Download Full-text

Benefits of combining dimensional attention and working memory for partially observable reinforcement learning problems

Proceedings of the 2021 ACM Southeast Conference ◽

10.1145/3409334.3452072 ◽

2021 ◽

Author(s):

Ngozi Omatu ◽

Joshua L. Phillips

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Learning Problems ◽

Partially Observable

Download Full-text

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

Machine Learning ◽

10.1007/s10994-021-05969-w ◽

2021 ◽

Author(s):

Wenjie Shang ◽

Qingyang Li ◽

Zhiwei Qin ◽

Yang Yu ◽

Yiping Meng ◽

...

Keyword(s):

Reinforcement Learning ◽

Partially Observable

Download Full-text

Learning to Coordinate Efficiently: A Model-based Approach

Journal of Artificial Intelligence Research ◽

10.1613/jair.1154 ◽

2003 ◽

Vol 19 ◽

pp. 11-23 ◽

Cited By ~ 15

Author(s):

R. I. Brafman ◽

M. Tennenholtz

Keyword(s):

Reinforcement Learning ◽

Simple Model ◽

Stochastic Games ◽

Convergence Rates ◽

Learning Algorithms ◽

Common Interest ◽

Model Based ◽

Optimal Value ◽

To Receive

In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms.

Download Full-text

Pseudo Random Number Generation through Reinforcement Learning and Recurrent Neural Networks

Algorithms ◽

10.3390/a13110307 ◽

2020 ◽

Vol 13 (11) ◽

pp. 307

Author(s):

Luca Pasqualini ◽

Maurizio Parton

Keyword(s):

Reinforcement Learning ◽

Random Number ◽

Short Term Memory ◽

Random Number Generator ◽

Random Number Generation ◽

Time Step ◽

Software Applications ◽

Pseudo Random Number ◽

Markov Decision ◽

Partially Observable

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.

Download Full-text

Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5587 ◽

2020 ◽

Vol 34 (02) ◽

pp. 2128-2135

Author(s):

Yang Liu ◽

Qi Liu ◽

Hongke Zhao ◽

Zhen Pan ◽

Chuanren Liu

Keyword(s):

Reinforcement Learning ◽

Trading Strategies ◽

Financial Data ◽

Imitation Learning ◽

Market Condition ◽

Exploration And Exploitation ◽

Markov Decision ◽

Trading Model ◽

Trading Agent ◽

Partially Observable

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e.g., machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.

Download Full-text

Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees

2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

10.1109/wi-iat.2012.161 ◽

2012 ◽

Cited By ~ 1

Author(s):

Erkin Cilden ◽

Faruk Polat

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Extended Sequence ◽

Partially Observable

Download Full-text

FUZZY STATE AGGREGATION AND POLICY HILL CLIMBING FOR STOCHASTIC ENVIRONMENTS

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026806001903 ◽

2006 ◽

Vol 06 (03) ◽

pp. 413-428 ◽

Cited By ~ 1

Author(s):

DEAN C. WARDELL ◽

GILBERT L. PETERSON

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Hill Climbing ◽

Individual Agent ◽

Q Learning ◽

Stochastic Environments ◽

State Aggregation ◽

Multi Agent ◽

Fuzzy State ◽

Better Than

Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the fastest policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing and fuzzy state aggregation function approximation is tested in two stochastic environments: Tileworld and the simulated robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning reinforcement learning alone. Results from the multi-agent RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through a weighted strategy sharing.

Download Full-text

A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes

Systems and Computers in Japan ◽

10.1002/scj.10645 ◽

2005 ◽

Vol 36 (3) ◽

pp. 42-52 ◽

Cited By ~ 3

Author(s):

Koichiro Takita ◽

Masafumi Hagiwara

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Learning Algorithm ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Reinforcement Learning Algorithm

Download Full-text

ANALYTICAL REVIEW OF MULTI-AGENT REINFORCEMENT LEARNING PROBLEMS

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2020.06.pp.048-056 ◽

2020 ◽

pp. 48-56

Author(s):

Yu. V. Dubenko

Keyword(s):

Reinforcement Learning ◽

Intelligent Agents ◽

Russian Language ◽

Learning Problems ◽

Multi Agent Systems ◽

Hierarchical Reinforcement Learning ◽

Collective Interaction ◽

Analytical Review ◽

Multi Agent ◽

Partially Observable

This paper is devoted to the problem of collective artificial intelligence in solving problems by intelligent agents in external environments. The environments may be: fully or partially observable, deterministic or stochastic, static or dynamic, discrete or continuous. The paper identifies problems of collective interaction of intelligent agents when they solve a class of tasks, which need to coordinate actions of agent group, e. g. task of exploring the territory of a complex infrastructure facility. It is revealed that the problem of reinforcement training in multi-agent systems is poorly presented in the press, especially in Russian-language publications. The article analyzes reinforcement learning, describes hierarchical reinforcement learning, presents basic methods to implement reinforcement learning. The concept of macro-action by agents integrated in groups is introduced. The main problems of intelligent agents collective interaction for problem solving (i. e. calculation of individual rewards for each agent; agent coordination issues; application of macro actions by agents integrated into groups; exchange of experience generated by various agents as part of solving a collective problem) are identified. The model of multi-agent reinforcement learning is described in details. The article describes problems of this approach building on existing solutions. Basic problems of multi-agent reinforcement learning are formulated in conclusion.

Download Full-text