Reinforcement-learning-based optimal trading in a simulated futures market with heterogeneous agents

SIMULATION ◽  
2021 ◽  
pp. 003754972110611
Author(s):  
Nadi Serhan Aydin

This paper simulates a futures market with multiple agents and sequential auctions, where agents receive long-lived heterogeneous signals on the true value of an asset and with a known deadline. The evolution of the amount of differential information and its impact on the distribution of overall gains and the pace of truth discovery is examined for various depth levels of the limit order book (LOB). The paper also formulates a dynamic programming model for the problem and presents an associated reinforcement learning (RL) algorithm for finding optimal strategy in exploiting informational disparity. This is done from the perspective of an agent whose information is superior to the collective information of the rest of the market. Finally, a numerical analysis is presented based on a futures market example to validate the proposed methodology for finding the optimal strategy. We find evidence in favor of a waiting strategy where agent does not reveal her signal until the last auction before the deadline. This result may help bring more insight into the micro-structural dynamics that work against market efficiency.

2018 ◽  
Author(s):  
Minryung R. Song ◽  
Sang Wan Lee

AbstractDopamine activity may transition between two patterns: phasic responses to reward-predicting cues and ramping activity arising when an agent approaches the reward. However, when and why dopamine activity transitions between these modes is not understood. We hypothesize that the transition between ramping and phasic patterns reflects resource allocation which addresses the task dimensionality problem during reinforcement learning (RL). By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared it with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent narrows down candidate stimuli for the task; the opposite occurs when the agent needs to re-learn candidate stimuli due to a value change. These results lend insight into how dopamine deals with the tradeoff between cognitive resource and task dimensionality during RL.


2020 ◽  
Author(s):  
Muzhao Jin ◽  
Fearghal Joseph Kearney ◽  
Youwei Li ◽  
Yung Chiang Yang

2016 ◽  
Vol 19 (08) ◽  
pp. 1650055 ◽  
Author(s):  
M. ALESSANDRA CRISAFI ◽  
ANDREA MACRINA

We consider an optimal trading problem over a finite period of time during which an investor has access to both a standard exchange and a dark pool. We take the exchange to be an order-driven market and propose a continuous-time setup for the best bid price and the market spread, both modeled by Lévy processes. Effects on the best bid price arising from the arrival of limit buy orders at more favorable prices, the incoming market sell orders potentially walking the book, and deriving from the cancellations of limit sell orders at the best ask price are incorporated in the proposed price dynamics. A permanent impact that occurs when ‘lit’ pool trades cannot be avoided is built in, and an instantaneous impact that models the slippage, to which all lit exchange trades are subject, is also considered. We assume that the trading price in the dark pool is the mid-price and that no fees are due for posting orders. We allow for partial trade executions in the dark pool, and we find the optimal trading strategy in both venues. Since the mid-price is taken from the exchange, the dynamics of the limit order book also affects the optimal allocation of shares in the dark pool. We propose a general objective function and we show that, subject to suitable technical conditions, the value function can be characterized by the unique continuous viscosity solution to the associated partial integro-differential equation. We present two explicit examples of the price and the spread models, derive the associated optimal trading strategy numerically. We discuss the various degrees of the agent's risk aversion and further show that roundtrips are not necessarily beneficial.


2020 ◽  
Vol 12 (1) ◽  
pp. 80-90
Author(s):  
Jinhyung Kim ◽  
Andrew G. Christy ◽  
Grace N. Rivera ◽  
Joshua A. Hicks ◽  
Rebecca J. Schlegel

Many people endorse a “true-self-as-guide” (TSAG) lay theory of decision-making that suggests following one’s true self is an optimal strategy for making decisions. Across five studies ( N = 1,320), we test whether perceived use of the true self enhances decision satisfaction. Study 1 provides correlational evidence. Studies 2 and 3 provide experimental evidence that participants felt more satisfied with choices made under TSAG instructions, compared to alternate strategies. Critically, we argue that perceived use of the true self enhances decision satisfaction regardless of whether consulting the true self actually influences the decision made. Studies 4 and 5 find evidence in support of this perceptual mechanism. This research provides insight into one way by which people find satisfaction amid life’s uncertainty, extending existing research on the role of the concept of true selves in positive functioning.


2016 ◽  
Vol 115 (6) ◽  
pp. 3195-3203 ◽  
Author(s):  
Simon Dunne ◽  
Arun D'Souza ◽  
John P. O'Doherty

A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Rui Zhang ◽  
Hui Xia ◽  
Chao Liu ◽  
Ruo-bing Jiang ◽  
Xiang-guo Cheng

Internet of Things realizes the leap from traditional industry to intelligent industry. However, it makes edge devices more vulnerable to attackers during processing perceptual data in real time. To solve the above problem, we use the zero-sum game to build the interactions between attackers and edge devices and propose an antiattack scheme based on deep reinforcement learning. Firstly, we make the k NN-DTW algorithm to find a sample that is similar to the current sample and use the weighted moving mean method to calculate the mean and the variance of the samples. Secondly, to solve the overestimation problem, we develop an optimal strategy algorithm to find the optimal strategy of the edge devices. Experimental results prove that the new scheme improves the payoff of attacked edge devices and decreases the payoff of attackers, thus forcing the attackers to give up the attack.


2020 ◽  
Vol 1 ◽  
pp. 6
Author(s):  
Alexandra Vedeler ◽  
Narada Warakagoda

The task of obstacle avoidance using maritime vessels, such as Unmanned Surface Vehicles (USV), has traditionally been solved using specialized modules that are designed and optimized separately. However, this approach requires a deep insight into the environment, the vessel, and their complex dynamics. We propose an alternative method using Imitation Learning (IL) through Deep Reinforcement Learning (RL) and Deep Inverse Reinforcement Learning (IRL) and present a system that learns an end-to-end steering model capable of mapping radar-like images directly to steering actions in an obstacle avoidance scenario. The USV used in the work is equipped with a Radar sensor and we studied the problem of generating a single action parameter, heading. We apply an IL algorithm known as generative adversarial imitation learning (GAIL) to develop an end-to-end steering model for a scenario where avoidance of an obstacle is the goal. The performance of the system was studied for different design choices and compared to that of a system that is based on pure RL. The IL system produces results that indicate it is able to grasp the concept of the task and that in many ways are on par with the RL system. We deem this to be promising for future use in tasks that are not as easily described by a reward function.  


2018 ◽  
Author(s):  
Tobias Morville ◽  
Karl Friston ◽  
Denis Burdakov ◽  
Hartwig R. Siebner ◽  
Oliver J. Hulme

AbstractEnergy homeostasis depends on behavior to predictively regulate metabolic states within narrow bounds. Here we review three theories of homeostatic control and ask how they provide insight into the circuitry underlying energy homeostasis. We offer two contributions. First, we detail how control theory and reinforcement learning are applied to homeostatic control. We show how these schemes rest on implausible assumptions; either via circular definitions, unprincipled drive functions, or by ignoring environmental volatility. We argue active inference can elude these shortcomings while retaining important features of each model. Second, we review the neural basis of energetic control. We focus on a subset of arcuate subpopulations that project directly to, and are thus in a privileged position to opponently modulate, dopaminergic cells as a function of energetic predictions over a spectrum of time horizons. We discuss how this can be interpreted under these theories, and how this can resolve paradoxes that have arisen. We propose this circuit constitutes a homeostatic-reward interface that underwrites the conjoint optimisation of physiological and behavioural homeostasis.


2014 ◽  
Vol 45 (6) ◽  
pp. 466-478 ◽  
Author(s):  
Robert Schnuerch ◽  
Henning Gibbons

In groups, individuals often adjust their behavior to the majority’s. Here, we provide a brief introduction into the research on social conformity and review the first, very recent investigations elucidating the underlying neurocognitive mechanisms. Multiple studies suggest that conformity is a behavioral adjustment based on reinforcement-learning mechanisms in posterior medial frontal cortex and ventral striatum. It has also been suggested that the detection of cognitive inconsistency and the modulation of basic encoding processes are involved. Together, recent findings provide valuable insight into the neural and cognitive mechanisms underlying social conformity and clearly point up the need for further studies in this field.


Sign in / Sign up

Export Citation Format

Share Document