Contracts for Difference: A Reinforcement Learning Approach

We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.

Download Full-text

Pseudo Random Number Generation through Reinforcement Learning and Recurrent Neural Networks

Algorithms ◽

10.3390/a13110307 ◽

2020 ◽

Vol 13 (11) ◽

pp. 307

Author(s):

Luca Pasqualini ◽

Maurizio Parton

Keyword(s):

Reinforcement Learning ◽

Random Number ◽

Short Term Memory ◽

Random Number Generator ◽

Random Number Generation ◽

Time Step ◽

Software Applications ◽

Pseudo Random Number ◽

Markov Decision ◽

Partially Observable

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.

Download Full-text

Cooperation and coordination between fuzzy reinforcement learning agents in continuous state partially observable Markov decision processes

FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315) ◽

10.1109/fuzzy.1999.793014 ◽

1999 ◽

Cited By ~ 10

Author(s):

H.R. Berenji ◽

D. Vengerov

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Learning Agents ◽

Continuous State ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Continuous reinforcement learning to adapt multi-objective optimization online for robot motion

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420911491 ◽

2020 ◽

Vol 17 (2) ◽

pp. 172988142091149

Author(s):

Kai Zhang ◽

Sterling McLeod ◽

Minwoo Lee ◽

Jing Xiao

Keyword(s):

Reinforcement Learning ◽

Environmental Changes ◽

Short Term Memory ◽

Dynamic Environments ◽

Multi Objective Optimization ◽

Continuous Reinforcement ◽

Multi Objective ◽

Learning Framework ◽

Time Motion ◽

Learning Agent

This article introduces a continuous reinforcement learning framework to enable online adaptation of multi-objective optimization functions for guiding a mobile robot to move in changing dynamic environments. The robot with this framework can continuously learn from multiple or changing environments where it encounters different numbers of obstacles moving in unknown ways at different times. Using both planned trajectories from a real-time motion planner and already executed trajectories as feedback observations, our reinforcement learning agent enables the robot to adapt motion behaviors to environmental changes. The agent contains a Q network connected to a long short-term memory network. The proposed framework is tested in both simulations and real robot experiments over various, dynamically varied task environments. The results show the efficacy of online continuous reinforcement learning for quick adaption to different, unknown, and dynamic environments.

Download Full-text

Observation Time Effects in Reinforcement Learning on Contracts for Difference

Journal of Risk and Financial Management ◽

10.3390/jrfm14020054 ◽

2021 ◽

Vol 14 (2) ◽

pp. 54

Author(s):

Maximilian Wehrmann ◽

Nico Zengeler ◽

Uwe Handmann

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

Observation Time ◽

Sequence Length ◽

Learning Agents ◽

A Value ◽

Observation Sequence ◽

Long Short Term Memory ◽

Contracts For Difference ◽

Simulated Market

In this paper, we present a study on Reinforcement Learning optimization models for automatic trading, in which we focus on the effects of varying the observation time. Our Reinforcement Learning agents feature a Convolutional Neural Network (CNN) together with Long Short-Term Memory (LSTM) and act on the basis of different observation time spans. Each agent tries to maximize trading profit by buying or selling one of a number of contracts in a simulated market environment for Contracts for Difference (CfD), considering correlations between individual assets by architecture. To decide which action to take on a specific contract, an agent develops a policy which relies on an observation of the whole market for a certain period of time. We investigate whether or not there exists an optimal observation sequence length, and conclude that such a value depends on market dynamics.

Download Full-text

UAV Autonomous Tracking and Landing Based on Deep Reinforcement Learning Strategy

Sensors ◽

10.3390/s20195630 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5630

Author(s):

Jingyi Xie ◽

Xiaodong Peng ◽

Haijiao Wang ◽

Wenlong Niu ◽

Xiao Zheng

Keyword(s):

Reinforcement Learning ◽

Learning Strategy ◽

Control Method ◽

Heuristic Rules ◽

Learning Framework ◽

Model Free ◽

Simulation Engine ◽

Markov Decision ◽

Moving Platform ◽

Partially Observable

Unmanned aerial vehicle (UAV) autonomous tracking and landing is playing an increasingly important role in military and civil applications. In particular, machine learning has been successfully introduced to robotics-related tasks. A novel UAV autonomous tracking and landing approach based on a deep reinforcement learning strategy is presented in this paper, with the aim of dealing with the UAV motion control problem in an unpredictable and harsh environment. Instead of building a prior model and inferring the landing actions based on heuristic rules, a model-free method based on a partially observable Markov decision process (POMDP) is proposed. In the POMDP model, the UAV automatically learns the landing maneuver by an end-to-end neural network, which combines the Deep Deterministic Policy Gradients (DDPG) algorithm and heuristic rules. A Modular Open Robots Simulation Engine (MORSE)-based reinforcement learning framework is designed and validated with a continuous UAV tracking and landing task on a randomly moving platform in high sensor noise and intermittent measurements. The simulation results show that when the moving platform is moving in different trajectories, the average landing success rate of the proposed algorithm is about 10% higher than that of the Proportional-Integral-Derivative (PID) method. As an indirect result, a state-of-the-art deep reinforcement learning-based UAV control method is validated, where the UAV can learn the optimal strategy of a continuously autonomous landing and perform properly in a simulation environment.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

AUTOMATED VULNERABILITY SEARCH IN A WEB APPLICATION BASED ON REINFORCEMENT LEARNING

CASPIAN JOURNAL Control and High Technologies ◽

10.21672/2074-1707.2021.53.1.091-097 ◽

2021 ◽

Vol 53 (1) ◽

pp. 91-97

Author(s):

OLGA N. VYBORNOVA ◽

◽

ALEKSANDER N. RYZHIKOV ◽

Keyword(s):

Reinforcement Learning ◽

Web Application ◽

Web Applications ◽

Subject Area ◽

Learning Technology ◽

Web Application Security ◽

Vulnerability Scanner ◽

Learning Agent ◽

Markov Decision ◽

Python Programming

We analyzed the urgency of the task of creating a more efficient (compared to analogues) means of automated vulnerability search based on modern technologies. We have shown the similarity of the vulnerabilities identifying process with the Markov decision-making process and justified the feasibility of using reinforcement learning technology for solving this problem. Since the analysis of the web application security is currently the highest priority and in demand, within the framework of this work, the application of the mathematical apparatus of reinforcement learning with to this subject area is considered. The mathematical model is presented, the specifics of the training and testing processes for the problem of automated vulnerability search in web applications are described. Based on an analysis of the OWASP Testing Guide, an action space and a set of environment states are identified. The characteristics of the software implementation of the proposed model are described: Q-learning is implemented in the Python programming language; a neural network was created to implement the learning policy using the tensorflow library. We demonstrated the results of the Reinforcement Learning agent on a real web application, as well as their comparison with the report of the Acunetix Vulnerability Scanner. The findings indicate that the proposed solution is promising.

Download Full-text

Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5587 ◽

2020 ◽

Vol 34 (02) ◽

pp. 2128-2135

Author(s):

Yang Liu ◽

Qi Liu ◽

Hongke Zhao ◽

Zhen Pan ◽

Chuanren Liu

Keyword(s):

Reinforcement Learning ◽

Trading Strategies ◽

Financial Data ◽

Imitation Learning ◽

Market Condition ◽

Exploration And Exploitation ◽

Markov Decision ◽

Trading Model ◽

Trading Agent ◽

Partially Observable

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e.g., machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.

Download Full-text

Constrained representation learning for recurrent policy optimisation under uncertainty

Adaptive Behavior ◽

10.1177/1059712319891641 ◽

2019 ◽

pp. 105971231989164

Author(s):

Viet-Hung Dang ◽

Ngo Anh Vien ◽

TaeChoong Chung

Keyword(s):

Short Term Memory ◽

Internal State ◽

Representation Learning ◽

Linear Mapping ◽

State Representation ◽

Future Observation ◽

Non Linear ◽

Internal States ◽

Markov Decision ◽

Partially Observable

Learning to make decisions in partially observable environments is a notorious problem that requires a complex representation of controllers. In most work, the controllers are designed as a non-linear mapping from a sequence of temporal observations to actions. These problems can, in principle, be formulated as a partially observable Markov decision process whose policy can be parameterised through the use of recurrent neural networks. In this paper, we will propose an alternative framework that (a) uses the Long-Short-Term-Memory (LSTM) Encoder-Decoder framework to learn an internal state representation for historical observations and then (b) integrates it into existing recurrent policy models to improve the task performance. The LSTM Encoder encodes a history of observations as input into a representation of internal states. The LSTM Decoder can perform two alternative decoding tasks: predicting the same input observation sequence or predicting future observation sequences. The first proposed decoder acts like an auto-encoder that will guide and constrain the learning of a useful internal state for the policy optimisation task. The second proposed decoder decodes the learnt internal state by the encoder to predict future observation sequences. This idea makes the network act like a non-linear predictive state representation model. Both these decoding parts, which introduce constraints to policy representation, will help guide both the policy optimisation problem and latent state representation learning. The integration of representation learning and policy optimisation aims to help learn more complex policies and improve the performance of policy learning tasks.

Download Full-text