Constrained representation learning for recurrent policy optimisation under uncertainty

Learning to make decisions in partially observable environments is a notorious problem that requires a complex representation of controllers. In most work, the controllers are designed as a non-linear mapping from a sequence of temporal observations to actions. These problems can, in principle, be formulated as a partially observable Markov decision process whose policy can be parameterised through the use of recurrent neural networks. In this paper, we will propose an alternative framework that (a) uses the Long-Short-Term-Memory (LSTM) Encoder-Decoder framework to learn an internal state representation for historical observations and then (b) integrates it into existing recurrent policy models to improve the task performance. The LSTM Encoder encodes a history of observations as input into a representation of internal states. The LSTM Decoder can perform two alternative decoding tasks: predicting the same input observation sequence or predicting future observation sequences. The first proposed decoder acts like an auto-encoder that will guide and constrain the learning of a useful internal state for the policy optimisation task. The second proposed decoder decodes the learnt internal state by the encoder to predict future observation sequences. This idea makes the network act like a non-linear predictive state representation model. Both these decoding parts, which introduce constraints to policy representation, will help guide both the policy optimisation problem and latent state representation learning. The integration of representation learning and policy optimisation aims to help learn more complex policies and improve the performance of policy learning tasks.

Download Full-text

Autonomous Decision-Making While Drilling

Energies ◽

10.3390/en14040969 ◽

2021 ◽

Vol 14 (4) ◽

pp. 969

Author(s):

Eric Cayeux ◽

Benoît Daireaux ◽

Adrian Ambrus ◽

Rodica Mihai ◽

Liv Carlsen

Keyword(s):

Decision Making ◽

Internal State ◽

Drilling Process ◽

Autonomous Decision ◽

Internal States ◽

Markov Decision ◽

Drilling Operations ◽

Drilling System ◽

Drilling Conditions ◽

Erratic Behavior

The drilling process is complex because unexpected situations may occur at any time. Furthermore, the drilling system is extremely long and slender, therefore prone to vibrations and often being dominated by long transient periods. Adding the fact that measurements are not well distributed along the drilling system, with the majority of real-time measurements only available at the top side and having only access to very sparse data from downhole, the drilling process is poorly observed therefore making it difficult to use standard control methods. Therefore, to achieve completely autonomous drilling operations, it is necessary to utilize a method that is capable of estimating the internal state of the drilling system from parsimonious information while being able to make decisions that will keep the operation safe but effective. A solution enabling autonomous decision-making while drilling has been developed. It relies on an optimization of the time to reach the section total depth (TD). The estimated time to reach the section TD is decomposed into the effective time spent in conducting the drilling operation and the likely time lost to solve unexpected drilling events. This optimization problem is solved by using a Markov decision process method. Several example scenarios have been run in a virtual rig environment to test the validity of the concept. It is found that the system is capable to adapt itself to various drilling conditions, as for example being aggressive when the operation runs smoothly and the estimated uncertainty of the internal states is low, but also more cautious when the downhole drilling conditions deteriorate or when observations tend to indicate more erratic behavior, which is often observed prior to a drilling event.

Download Full-text

A Continuous Internal-State Controller for Partially Observable Markov Decision Processes

Artificial Neural Networks - ICANN 2008 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-87536-9_41 ◽

2008 ◽

pp. 397-406

Author(s):

Yuki Taniguchi ◽

Takeshi Mori ◽

Shin Ishii

Keyword(s):

Markov Decision Processes ◽

Internal State ◽

Decision Processes ◽

State Controller ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Pseudo Random Number Generation through Reinforcement Learning and Recurrent Neural Networks

Algorithms ◽

10.3390/a13110307 ◽

2020 ◽

Vol 13 (11) ◽

pp. 307

Author(s):

Luca Pasqualini ◽

Maurizio Parton

Keyword(s):

Reinforcement Learning ◽

Random Number ◽

Short Term Memory ◽

Random Number Generator ◽

Random Number Generation ◽

Time Step ◽

Software Applications ◽

Pseudo Random Number ◽

Markov Decision ◽

Partially Observable

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.

Download Full-text

Maintenance planning using continuous-state partially observable Markov decision processes and non-linear action models

Structure and Infrastructure Engineering ◽

10.1080/15732479.2015.1076485 ◽

2015 ◽

Vol 12 (8) ◽

pp. 977-994 ◽

Cited By ~ 15

Author(s):

Roland Schöbi ◽

Eleni N. Chatzi

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Maintenance Planning ◽

Linear Action ◽

Continuous State ◽

Non Linear ◽

Markov Decision ◽

Action Models ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Contracts for Difference: A Reinforcement Learning Approach

Journal of Risk and Financial Management ◽

10.3390/jrfm13040078 ◽

2020 ◽

Vol 13 (4) ◽

pp. 78

Author(s):

Nico Zengeler ◽

Uwe Handmann

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

Learning Agents ◽

Learning Framework ◽

Learning Agent ◽

Markov Decision ◽

Economic Trends ◽

Model Size ◽

Contracts For Difference ◽

Partially Observable

We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.

Download Full-text

A Self-Organizing Incremental Spatiotemporal Associative Memory Networks Model for Problems with Hidden State

Computational Intelligence and Neuroscience ◽

10.1155/2016/7158507 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14

Author(s):

Zuo-wei Wang

Keyword(s):

Associative Memory ◽

State Transition ◽

Short Term Memory ◽

Activation Mechanism ◽

Long Term Memory ◽

Transition Model ◽

Term Memory ◽

Markov Decision ◽

State Transition Model ◽

Partially Observable

Identifying the hidden state is important for solving problems with hidden state. We prove any deterministic partially observable Markov decision processes (POMDP) can be represented by a minimal, looping hidden state transition model and propose a heuristic state transition model constructing algorithm. A new spatiotemporal associative memory network (STAMN) is proposed to realize the minimal, looping hidden state transition model. STAMN utilizes the neuroactivity decay to realize the short-term memory, connection weights between different nodes to represent long-term memory, presynaptic potentials, and synchronized activation mechanism to complete identifying and recalling simultaneously. Finally, we give the empirical illustrations of the STAMN and compare the performance of the STAMN model with that of other methods.

Download Full-text

State Representation Learning with Robotic Priors for Partially Observable Environments

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros40897.2019.8967938 ◽

2019 ◽

Author(s):

Marco Morik ◽

Divyam Rastogi ◽

Rico Jonschkowski ◽

Oliver Brock

Keyword(s):

Representation Learning ◽

State Representation ◽

Partially Observable

Download Full-text

LSTM-DDPG for Trading with Variable Positions

Sensors ◽

10.3390/s21196571 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6571

Author(s):

Zhichao Jia ◽

Qiang Gao ◽

Xiaohong Peng

Keyword(s):

Short Term Memory ◽

Index Futures ◽

Reward Function ◽

Variable Position ◽

Policy Gradient ◽

Trading Decisions ◽

Markov Decision ◽

Market State ◽

Lstm Network ◽

Partially Observable

In recent years, machine learning for trading has been widely studied. The direction and size of position should be determined in trading decisions based on market conditions. However, there is no research so far that considers variable position sizes in models developed for trading purposes. In this paper, we propose a deep reinforcement learning model named LSTM-DDPG to make trading decisions with variable positions. Specifically, we consider the trading process as a Partially Observable Markov Decision Process, in which the long short-term memory (LSTM) network is used to extract market state features and the deep deterministic policy gradient (DDPG) framework is used to make trading decisions concerning the direction and variable size of position. We test the LSTM-DDPG model on IF300 (index futures of China stock market) data and the results show that LSTM-DDPG with variable positions performs better in terms of return and risk than models with fixed or few-level positions. In addition, the investment potential of the model can be better tapped by the reward function of the differential Sharpe ratio than that of profit reward function.

Download Full-text

Optimal adaptive inspection and maintenance for redundant systems

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x211020151 ◽

2021 ◽

pp. 1748006X2110201

Author(s):

Chaochao Lin ◽

Matteo Pozzi

Keyword(s):

Engineering Systems ◽

Discounted Cost ◽

Markov Decision ◽

Inspection And Maintenance ◽

And Performance ◽

Partially Observable ◽

Series Systems ◽

Selection Of ◽

Redundant Systems

Optimal exploration of engineering systems can be guided by the principle of Value of Information (VoI), which accounts for the topological important of components, their reliability and the management costs. For series systems, in most cases higher inspection priority should be given to unreliable components. For redundant systems such as parallel systems, analysis of one-shot decision problems shows that higher inspection priority should be given to more reliable components. This paper investigates the optimal exploration of redundant systems in long-term decision making with sequential inspection and repairing. When the expected, cumulated, discounted cost is considered, it may become more efficient to give higher inspection priority to less reliable components, in order to preserve system redundancy. To investigate this problem, we develop a Partially Observable Markov Decision Process (POMDP) framework for sequential inspection and maintenance of redundant systems, where the VoI analysis is embedded in the optimal selection of exploratory actions. We investigate the use of alternative approximate POMDP solvers for parallel and more general systems, compare their computation complexities and performance, and show how the inspection priorities depend on the economic discount factor, the degradation rate, the inspection precision, and the repair cost.

Download Full-text

A constraint partially observable semi-Markov decision process for the attack–defence relationships in various critical infrastructures

Cyber-Physical Systems ◽

10.1080/23335777.2021.1879935 ◽

2021 ◽

pp. 1-26

Author(s):

Nadia Niknami ◽

Jie Wu

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Critical Infrastructures ◽

Markov Decision ◽

Partially Observable

Download Full-text