Reinforcement Learning for Systematic FX Trading

We conduct a detailed experiment on major cash fx pairs, accurately accounting for transaction and funding costs. These sources of profit and loss, including the price trends that occur in the currency markets, are made available to our recurrent reinforcement learner via a quadratic utility, which learns to target a position directly. We improve upon earlier work, by casting the problem of learning to target a risk position, in an online learning context. This online learning occurs sequentially in time, but also in the form of transfer learning. We transfer the output of radial basis function hidden processing units, whose means, covariances and overall size are determined by Gaussian mixture models, to the recurrent reinforcement learner and baseline momentum trader. Thus the intrinsic nature of the feature space is learnt and made available to the upstream models. The recurrent reinforcement learning trader achieves an annualised portfolio information ratio of 0.52 with compound return of 9.3\%, net of execution and funding cost, over a 7 year test set. This is despite forcing the model to trade at the close of the trading day 5pm EST, when trading costs are statistically the most expensive. These results are comparable with the momentum baseline trader, reflecting the low interest differential environment since the the 2008 financial crisis, and very obvious currency trends since then. The recurrent reinforcement learner does nevertheless maintain an important advantage, in that the model's weights can be adapted to reflect the different sources of profit and loss variation. This is demonstrated visually by a USDRUB trading agent, who learns to target different positions, that reflect trading in the absence or presence of cost.<br>

Reinforcement Learning for Systematic FX Trading

10.36227/techrxiv.16778932 ◽

2021 ◽

Author(s):

Gabriel Borrageiro ◽

Nick Firoozye ◽

Paolo Barucca

Keyword(s):

Online Learning ◽

Reinforcement Learning ◽

Gaussian Mixture Models ◽

Feature Space ◽

Gaussian Mixture ◽

Learning Context ◽

Information Ratio ◽

Intrinsic Nature ◽

We conduct a detailed experiment on major cash fx pairs, accurately accounting for transaction and funding costs. These sources of profit and loss, including the price trends that occur in the currency markets, are made available to our recurrent reinforcement learner via a quadratic utility, which learns to target a position directly. We improve upon earlier work, by casting the problem of learning to target a risk position, in an online learning context. This online learning occurs sequentially in time, but also in the form of transfer learning. We transfer the output of radial basis function hidden processing units, whose means, covariances and overall size are determined by Gaussian mixture models, to the recurrent reinforcement learner and baseline momentum trader. Thus the intrinsic nature of the feature space is learnt and made available to the upstream models. The recurrent reinforcement learning trader achieves an annualised portfolio information ratio of 0.52 with compound return of 9.3\%, net of execution and funding cost, over a 7 year test set. This is despite forcing the model to trade at the close of the trading day 5pm EST, when trading costs are statistically the most expensive. These results are comparable with the momentum baseline trader, reflecting the low interest differential environment since the the 2008 financial crisis, and very obvious currency trends since then. The recurrent reinforcement learner does nevertheless maintain an important advantage, in that the model's weights can be adapted to reflect the different sources of profit and loss variation. This is demonstrated visually by a USDRUB trading agent, who learns to target different positions, that reflect trading in the absence or presence of cost.<br>

Reinforcement Learning for Systematic FX Trading

10.36227/techrxiv.16778932.v1 ◽

2021 ◽

Author(s):

Gabriel Borrageiro ◽

Nick Firoozye ◽

Paolo Barucca

Keyword(s):

Online Learning ◽

Reinforcement Learning ◽

Gaussian Mixture Models ◽

Feature Space ◽

Gaussian Mixture ◽

Learning Context ◽

Information Ratio ◽

Intrinsic Nature ◽

We conduct a detailed experiment on major cash fx pairs, accurately accounting for transaction and funding costs. These sources of profit and loss, including the price trends that occur in the currency markets, are made available to our recurrent reinforcement learner via a quadratic utility, which learns to target a position directly. We improve upon earlier work, by casting the problem of learning to target a risk position, in an online learning context. This online learning occurs sequentially in time, but also in the form of transfer learning. We transfer the output of radial basis function hidden processing units, whose means, covariances and overall size are determined by Gaussian mixture models, to the recurrent reinforcement learner and baseline momentum trader. Thus the intrinsic nature of the feature space is learnt and made available to the upstream models. The recurrent reinforcement learning trader achieves an annualised portfolio information ratio of 0.52 with compound return of 9.3\%, net of execution and funding cost, over a 7 year test set. This is despite forcing the model to trade at the close of the trading day 5pm EST, when trading costs are statistically the most expensive. These results are comparable with the momentum baseline trader, reflecting the low interest differential environment since the the 2008 financial crisis, and very obvious currency trends since then. The recurrent reinforcement learner does nevertheless maintain an important advantage, in that the model's weights can be adapted to reflect the different sources of profit and loss variation. This is demonstrated visually by a USDRUB trading agent, who learns to target different positions, that reflect trading in the absence or presence of cost.<br>

Reinforcement Learning for Systematic FX Trading

10.36227/techrxiv.16778932.v2 ◽

2021 ◽

Author(s):

Gabriel Borrageiro ◽

Nick Firoozye ◽

Paolo Barucca

Keyword(s):

Online Learning ◽

Reinforcement Learning ◽

Gaussian Mixture Models ◽

Feature Space ◽

Gaussian Mixture ◽

Learning Context ◽

Information Ratio ◽

Intrinsic Nature ◽

We conduct a detailed experiment on major cash fx pairs, accurately accounting for transaction and funding costs. These sources of profit and loss, including the price trends that occur in the currency markets, are made available to our recurrent reinforcement learner via a quadratic utility, which learns to target a position directly. We improve upon earlier work, by casting the problem of learning to target a risk position, in an online learning context. This online learning occurs sequentially in time, but also in the form of transfer learning. We transfer the output of radial basis function hidden processing units, whose means, covariances and overall size are determined by Gaussian mixture models, to the recurrent reinforcement learner and baseline momentum trader. Thus the intrinsic nature of the feature space is learnt and made available to the upstream models. The recurrent reinforcement learning trader achieves an annualised portfolio information ratio of 0.52 with compound return of 9.3\%, net of execution and funding cost, over a 7 year test set. This is despite forcing the model to trade at the close of the trading day 5pm EST, when trading costs are statistically the most expensive. These results are comparable with the momentum baseline trader, reflecting the low interest differential environment since the the 2008 financial crisis, and very obvious currency trends since then. The recurrent reinforcement learner does nevertheless maintain an important advantage, in that the model's weights can be adapted to reflect the different sources of profit and loss variation. This is demonstrated visually by a USDRUB trading agent, who learns to target different positions, that reflect trading in the absence or presence of cost.<br>

Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning

10.5753/ctd.2020.11360 ◽

2020 ◽

Author(s):

Felipe Leno Da Silva ◽

Anna Helena Reali Costa

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Learning Process ◽

Trial And Error ◽

Knowledge Reuse ◽

Previous Knowledge ◽

Learning Methods ◽

Types Of Knowledge ◽

Learning Agent ◽

Multiagent Reinforcement Learning

Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important steps towards more flexible and broadly applicable multiagent transfer learning methods.

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Individual Tree Extraction from Terrestrial LiDAR Point Clouds Based on Transfer Learning and Gaussian Mixture Model Separation

Remote Sensing ◽

10.3390/rs13020223 ◽

2021 ◽

Vol 13 (2) ◽

pp. 223

Author(s):

Zhenyang Hui ◽

Shuanggen Jin ◽

Dajun Li ◽

Yao Yevenyo Ziggah ◽

Bo Liu

Keyword(s):

Transfer Learning ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Principal Component ◽

Point Clouds ◽

Gaussian Mixture ◽

Individual Tree ◽

Terrestrial Lidar ◽

Initial Segmentation ◽

Tree Extraction

Individual tree extraction is an important process for forest resource surveying and monitoring. To obtain more accurate individual tree extraction results, this paper proposed an individual tree extraction method based on transfer learning and Gaussian mixture model separation. In this study, transfer learning is first adopted in classifying trunk points, which can be used as clustering centers for tree initial segmentation. Subsequently, principal component analysis (PCA) transformation and kernel density estimation are proposed to determine the number of mixed components in the initial segmentation. Based on the number of mixed components, the Gaussian mixture model separation is proposed to separate canopies for each individual tree. Finally, the trunk stems corresponding to each canopy are extracted based on the vertical continuity principle. Six tree plots with different forest environments were used to test the performance of the proposed method. Experimental results show that the proposed method can achieve 87.68% average correctness, which is much higher than that of other two classical methods. In terms of completeness and mean accuracy, the proposed method also outperforms the other two methods.

Crowd Evacuation Guidance Based on Combined Action Reinforcement Learning

Algorithms ◽

10.3390/a14010026 ◽

2021 ◽

Vol 14 (1) ◽

pp. 26

Author(s):

Yiran Xue ◽

Rui Wu ◽

Jiafeng Liu ◽

Xianglong Tang

Keyword(s):

Reinforcement Learning ◽

Guidance System ◽

Force Model ◽

Interactive Simulation ◽

Social Force ◽

Novel Approach ◽

Learning Agent ◽

Network Output ◽

Combined Action ◽

Crowd Evacuation

Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring a significant workload and a potential for errors. This paper proposed an end-to-end intelligent evacuation guidance method based on deep reinforcement learning, and designed an interactive simulation environment based on the social force model. The agent could automatically learn a scene model and path planning strategy with only scene images as input, and directly output dynamic signage information. Aiming to solve the “dimension disaster” phenomenon of the deep Q network (DQN) algorithm in crowd evacuation, this paper proposed a combined action-space DQN (CA-DQN) algorithm that grouped Q network output layer nodes according to action dimensions, which significantly reduced the network complexity and improved system practicality in complex scenes. In this paper, the evacuation guidance system is defined as a reinforcement learning agent and implemented by the CA-DQN method, which provides a novel approach for the evacuation guidance problem. The experiments demonstrate that the proposed method is superior to the static guidance method, and on par with the manually designed model method.

Experience Sharing Based Memetic Transfer Learning for Multiagent Reinforcement Learning

Memetic Computing ◽

10.1007/s12293-021-00339-4 ◽

2021 ◽

Author(s):

Tonghao Wang ◽

Xingguang Peng ◽

Yaochu Jin ◽

Demin Xu

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Multiagent Reinforcement Learning