Reinforcement Learning in Financial Markets

Recently there has been an exponential increase in the use of artificial intelligence for trading in financial markets such as stock and forex. Reinforcement learning has become of particular interest to financial traders ever since the program AlphaGo defeated the strongest human contemporary Go board game player Lee Sedol in 2016. We systematically reviewed all recent stock/forex prediction or trading articles that used reinforcement learning as their primary machine learning method. All reviewed articles had some unrealistic assumptions such as no transaction costs, no liquidity issues and no bid or ask spread issues. Transaction costs had significant impacts on the profitability of the reinforcement learning algorithms compared with the baseline algorithms tested. Despite showing statistically significant profitability when reinforcement learning was used in comparison with baseline models in many studies, some showed no meaningful level of profitability, in particular with large changes in the price pattern between the system training and testing data. Furthermore, few performance comparisons between reinforcement learning and other sophisticated machine/deep learning models were provided. The impact of transaction costs, including the bid/ask spread on profitability has also been assessed. In conclusion, reinforcement learning in stock/forex trading is still in its early development and further research is needed to make it a reliable method in this domain.

Download Full-text

The Psychology of Traders

10.1093/acprof:oso/9780190269999.003.0011 ◽

2017 ◽

Author(s):

Duccio Martelli

Keyword(s):

Financial Markets ◽

Transaction Costs ◽

Trading Strategies ◽

Individual Investors ◽

Market Sentiment ◽

Behavioral Bias ◽

Use Of Technology ◽

New Information ◽

Retail Investors ◽

The Impact

In recent decades, trading has become very popular among retail investors, mainly owing to widespread use of technology and a reduction in transaction costs. However, the growing information available to individuals and the higher complexity of financial markets have led investors to make psychological mistakes more easily. This chapter describes the main types of behavioral bias that affect individual investors, especially retail traders who frequently churn their portfolios. The chapter compares momentum and contrarian trading strategies used by such traders. It also discusses the impact of new information on market sentiment and its effect on trader psychology. Finally, the chapter examines the main behaviors of novice traders, followed by a summary of various studies that analyze the conduct of novice investors in the course of investment challenges and trading simulations.

Download Full-text

High dietary restraint improves performance on a food-motivated probabilistic selection task

10.31232/osf.io/wk4bt ◽

2019 ◽

Cited By ~ 2

Author(s):

Jennifer R Sadler ◽

Grace Elisabeth Shearrer ◽

Nichollette Acosta ◽

Kyle Stanley Burger

Keyword(s):

Reinforcement Learning ◽

Instrumental Conditioning ◽

Food Reinforcement ◽

Dietary Restraint ◽

Memory Task ◽

Brain Response ◽

Anthropometric Measures ◽

Instrumental Task ◽

The Impact ◽

Reward And Punishment

BACKGROUND: Dietary restraint represents an individual’s intent to limit their food intake and has been associated with impaired passive food reinforcement learning. However, the impact of dietary restraint on an active, response dependent learning is poorly understood. In this study, we tested the relationship between dietary restraint and food reinforcement learning using an active, instrumental conditioning task. METHODS: A sample of ninety adults completed a response-dependent instrumental conditioning task with reward and punishment using sweet and bitter tastes. Brain response via functional MRI was measured during the task. Participants also completed anthropometric measures, reward/motivation related questionnaires, and a working memory task. Dietary restraint was assessed via the Dutch Restrained Eating Scale. RESULTS: Two groups were selected from the sample: high restraint (n=29, score >2.5) and low restraint (n=30; score <1.85). High restraint was associated with significantly higher BMI (p=0.003) and lower N-back accuracy (p=0.045). The high restraint group also was marginally better at the instrumental conditioning task (p=0.066, r=0.37). High restraint was also associated with significantly greater brain response in the intracalcarine cortex (MNI: 15, -69, 12; k=35, pfwe< 0.05) to bitter taste, compared to neutral taste.CONCLUSIONS: High restraint was associated with improved performance on an instrumental task testing how individuals learn from reward and punishment. This may be mediated by greater brain response in the primary visual cortex, which has been associated with mental representation. Results suggest that dietary restraint does not impair response-dependent reinforcement learning.

Download Full-text

The Impact of EMU on European Financial Markets

Revue d économie financière (English ed ) ◽

10.3406/ecofi.2001.4463 ◽

2001 ◽

Vol 62 (2) ◽

pp. 83-95

Author(s):

Ernst-Ludwig von Thadden

Keyword(s):

Financial Markets ◽

The Impact

Download Full-text

The Impact of Unsuccessful Pirate Attacks on Financial Markets: A Confirmation of Reputation Building Theory

SSRN Electronic Journal ◽

10.2139/ssrn.2510125 ◽

2014 ◽

Author(s):

Ariel R. Belasen ◽

Ali M. Kutan ◽

Alan T. Belasen

Keyword(s):

Financial Markets ◽

The Impact ◽

Reputation Building

Download Full-text

Monetary Policy Transmission and the Impact on Financial Markets, Inflation and the Real Economy

SSRN Electronic Journal ◽

10.2139/ssrn.3436781 ◽

2017 ◽

Author(s):

Frederiek Van Holle

Keyword(s):

Monetary Policy ◽

Financial Markets ◽

Real Economy ◽

Monetary Policy Transmission ◽

The Real ◽

Policy Transmission ◽

The Impact

Download Full-text

Normative dimensions of central banking: How the guardians of financial markets affect justice

10.1093/oso/9780198755661.003.0010 ◽

2017 ◽

Cited By ~ 1

Author(s):

Peter Dietsch

Keyword(s):

Monetary Policy ◽

Financial Crisis ◽

Financial Markets ◽

Central Banking ◽

Democratic Legitimacy ◽

Policy Agenda ◽

Theory Of Justice ◽

Introductory Section ◽

The Impact

Monetary policy, and the response it elicits from financial markets, raises normative questions. This chapter, building on an introductory section on the objectives and instruments of monetary policy, analyzes two such questions. First, it assesses the impact of monetary policy on inequality and argues that the unconventional policies adopted in the wake of the financial crisis exacerbate inequalities in income and wealth. Depending on the theory of justice one holds, this impact is problematic. Should monetary policy be sensitive to inequalities and, if so, how? Second, the chapter argues that the leverage that financial markets have today over the monetary policy agenda undermines democratic legitimacy.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Applied Sciences ◽

10.3390/app11031291 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1291

Author(s):

Bonwoo Gu ◽

Yunsick Sung

Keyword(s):

Reinforcement Learning ◽

Search Algorithm ◽

Classification Criteria ◽

Tree Search ◽

Learning Method ◽

Board Game ◽

Ancient China ◽

Monte Carlo Tree Search ◽

High Level ◽

Tree Search Algorithm

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

Download Full-text

An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app11041514 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1514 ◽

Cited By ~ 2

Author(s):

Quang-Duy Tran ◽

Sang-Hoon Bae

Keyword(s):

Reinforcement Learning ◽

Traffic Congestion ◽

Autonomous Vehicles ◽

Penetration Rate ◽

Autonomous Vehicle ◽

Effective Means ◽

Urban Network ◽

Learning Agents ◽

Policy Optimization ◽

The Impact

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.

Download Full-text

Exploring optimal control of epidemic spread using reinforcement learning

Scientific Reports ◽

10.1038/s41598-020-79147-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Abu Quwsar Ohi ◽

M. F. Mridha ◽

Muhammad Mostafa Monowar ◽

Md. Abdul Hamid

Keyword(s):

Reinforcement Learning ◽

Economic Crisis ◽

Transmission Rate ◽

Optimal Solution ◽

Short Length ◽

Reproduction Rate ◽

Economic Factors ◽

Epidemic Spread ◽

The Impact ◽

Individual Human

AbstractPandemic defines the global outbreak of a disease having a high transmission rate. The impact of a pandemic situation can be lessened by restricting the movement of the mass. However, one of its concomitant circumstances is an economic crisis. In this article, we demonstrate what actions an agent (trained using reinforcement learning) may take in different possible scenarios of a pandemic depending on the spread of disease and economic factors. To train the agent, we design a virtual pandemic scenario closely related to the present COVID-19 crisis. Then, we apply reinforcement learning, a branch of artificial intelligence, that deals with how an individual (human/machine) should interact on an environment (real/virtual) to achieve the cherished goal. Finally, we demonstrate what optimal actions the agent perform to reduce the spread of disease while considering the economic factors. In our experiment, we let the agent find an optimal solution without providing any prior knowledge. After training, we observed that the agent places a long length lockdown to reduce the first surge of a disease. Furthermore, the agent places a combination of cyclic lockdowns and short length lockdowns to halt the resurgence of the disease. Analyzing the agent’s performed actions, we discover that the agent decides movement restrictions not only based on the number of the infectious population but also considering the reproduction rate of the disease. The estimation and policy of the agent may improve the human-strategy of placing lockdown so that an economic crisis may be avoided while mitigating an infectious disease.

Download Full-text