scholarly journals QF-TraderNet: Intraday Trading via Deep Reinforcement With Quantum Price Levels Based Profit-And-Loss Control

2021 ◽  
Vol 4 ◽  
Author(s):  
Yifu Qiu ◽  
Yitao Qiu ◽  
Yicong Yuan ◽  
Zheng Chen ◽  
Raymond Lee

Reinforcement Learning (RL) based machine trading attracts a rich profusion of interest. However, in the existing research, RL in the day-trade task suffers from the noisy financial movement in the short time scale, difficulty in order settlement, and expensive action search in a continuous-value space. This paper introduced an end-to-end RL intraday trading agent, namely QF-TraderNet, based on the quantum finance theory (QFT) and deep reinforcement learning. We proposed a novel design for the intraday RL trader’s action space, inspired by the Quantum Price Levels (QPLs). Our action space design also brings the model a learnable profit-and-loss control strategy. QF-TraderNet composes two neural networks: 1) A long short term memory networks for the feature learning of financial time series; 2) a policy generator network (PGN) for generating the distribution of actions. The profitability and robustness of QF-TraderNet have been verified in multi-type financial datasets, including FOREX, metals, crude oil, and financial indices. The experimental results demonstrate that QF-TraderNet outperforms other baselines in terms of cumulative price returns and Sharpe Ratio, and the robustness in the acceidential market shift.

2019 ◽  
Author(s):  
Niclas Ståhl ◽  
Göran Falkman ◽  
Alexander Karlsson ◽  
Gunnar Mathiason ◽  
Jonas Boström

<p>In medicinal chemistry programs it is key to design and make compounds that are efficacious and safe. This is a long, complex and difficult multi-parameter optimization process, often including several properties with orthogonal trends. New methods for the automated design of compounds against profiles of multiple properties are thus of great value. Here we present a fragment-based reinforcement learning approach based on an actor-critic model, for the generation of novel molecules with optimal properties. The actor and the critic are both modelled with bidirectional long short-term memory (LSTM) networks. The AI method learns how to generate new compounds with desired properties by starting from an initial set of lead molecules and then improve these by replacing some of their fragments. A balanced binary tree based on the similarity of fragments is used in the generative process to bias the output towards structurally similar molecules. The method is demonstrated by a case study showing that 93% of the generated molecules are chemically valid, and a third satisfy the targeted objectives, while there were none in the initial set.</p>


Author(s):  
Yuntao Han ◽  
Qibin Zhou ◽  
Fuqing Duan

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.


2021 ◽  
Vol 11 (3) ◽  
pp. 1125
Author(s):  
Htet Myet Lynn ◽  
Pankoo Kim ◽  
Sung Bum Pan

In this report, the study of non-fiducial based approaches for Electrocardiogram(ECG) biometric authentication is examined, and several excessive techniques are proposed to perform comparative experiments for evaluating the best possible approach for all the classification tasks. Non-fiducial methods are designed to extract the discriminative information of a signal without annotating fiducial points. However, this process requires peak detection to identify a heartbeat signal. Based on recent studies that usually rely on heartbeat segmentation, QRS detection is required, and the process can be complicated for ECG signals for which the QRS complex is absent. Thus, many studies only conduct biometric authentication tasks on ECG signals with QRS complexes, and are hindered by similar limitations. To overcome this issue, we proposed a data-independent acquisition method to facilitate highly generalizable signal processing and feature learning processes. This is achieved by enhancing random segmentation to avoid complicated fiducial feature extraction, along with auto-correlation to eliminate the phase difference due to random segmentation. Subsequently, a bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) deep networks is utilized to automatically learn the features associated with the signal and to perform an authentication task. The experimental results suggest that the proposed data-independent approach using a BLSTM network achieves a relatively high classification accuracy for every dataset relative to the compared techniques. Moreover, it exhibited a significantly higher accuracy rate in experiments using ECG signals without the QRS complex. The results also revealed that data-dependent methods can only perform well for specified data types and amendments of data variations, whereas the presented approach can also be considered for generalization to other quasi-periodical biometric signal-based classification tasks in future studies.


Sensors ◽  
2020 ◽  
Vol 20 (10) ◽  
pp. 2789 ◽  
Author(s):  
Hang Qi ◽  
Hao Huang ◽  
Zhiqun Hu ◽  
Xiangming Wen ◽  
Zhaoming Lu

In order to meet the ever-increasing traffic demand of Wireless Local Area Networks (WLANs), channel bonding is introduced in IEEE 802.11 standards. Although channel bonding effectively increases the transmission rate, the wider channel reduces the number of non-overlapping channels and is more susceptible to interference. Meanwhile, the traffic load differs from one access point (AP) to another and changes significantly depending on the time of day. Therefore, the primary channel and channel bonding bandwidth should be carefully selected to meet traffic demand and guarantee the performance gain. In this paper, we proposed an On-Demand Channel Bonding (O-DCB) algorithm based on Deep Reinforcement Learning (DRL) for heterogeneous WLANs to reduce transmission delay, where the APs have different channel bonding capabilities. In this problem, the state space is continuous and the action space is discrete. However, the size of action space increases exponentially with the number of APs by using single-agent DRL, which severely affects the learning rate. To accelerate learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is used to train O-DCB. Real traffic traces collected from a campus WLAN are used to train and test O-DCB. Simulation results reveal that the proposed algorithm has good convergence and lower delay than other algorithms.


Energies ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 3156
Author(s):  
Tanvir Alam Shifat ◽  
Rubiya Yasmin ◽  
Jang-Wook Hur

An effective remaining useful life (RUL) estimation method is of great concern in industrial machinery to ensure system reliability and reduce the risk of unexpected failures. Anticipation of an electric motor’s future state can improve the yield of a system and warrant the reuse of the industrial asset. In this paper, we present an effective RUL estimation framework of brushless DC (BLDC) motor using third harmonic analysis and output apparent power monitoring. In this work, the mechanical output of the BLDC motor is monitored through a coupled generator. To emphasize the total power generation, we have analyzed the trend of apparent power, which preserves the characteristics of real power and reactive power in an AC power system. A normalized modal current (NMC) is used to extract the current features from the BLDC motor. Fault characteristics of motor current and generator power are fused using a Kalman filter to estimate the RUL. Degradation patterns for the BLDC motor have been monitored for three different scenarios and for future predictions, an attention layer optimized bidirectional long short-term memory (ABLSTM) neural network model is trained. ABLSTM model’s performance is evaluated based on several metrics and compared with other state-of-the-art deep learning models.


2021 ◽  
Vol 16 (3) ◽  
pp. 54-69
Author(s):  
Pier Giuseppe Giribone ◽  
◽  
Duccio Martelli ◽  
◽  

An Inflation-Indexed Swap (IIS) is a derivative in which, at every payment date, the counterparties swap an inflation rate with a fixed rate. For the calculation of the Inflation Leg cash flows it is necessary to build a mathematical model suitable for the Consumer Price Index (CPI) projection. For this purpose, quants typically start by using market quotes for the Zero-Coupon swaps in order to derive the future trend of the inflation index, together with a seasonality model for capturing the typical periodical effects. In this study, we propose a forecasting model for inflation seasonality based on a Long Short Term Memory (LSTM) network: a deep learning methodology particularly useful for forecasting purposes. The CPI predictions are conducted using a FinTech paradigm, but in respect of the traditional quantitative finance theory developed in this research field. The paper is structured according to the following sections: the first two parts illustrate the pricing methodologies for the most popular IIS: the Zero Coupon Inflation-Indexed Swap (ZCIIS) and the Year-on-Year Inflation-Indexed Swap (YYIIS); section 3 deals with the traditional standard method for the forecast of CPI values (trend + seasonality), while section 4 describes the LSTM architecture, and section 5 focuses on CPI projections, also called inflation bootstrap. Then section 6 describes a robust check, implementing a traditional SARIMA model in order to improve the interpretation of the LSTM outputs; finally, section 7 concludes with a real market case, where the two methodologies are used for computing the fair-value for a YYIIS and the model risk is quantified.


Author(s):  
Qingyuan Zheng ◽  
Duo Wang ◽  
Zhang Chen ◽  
Yiyong Sun ◽  
Bin Liang

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.


2019 ◽  
Vol 1 (2) ◽  
pp. 74-84
Author(s):  
Evan Kusuma Susanto ◽  
Yosi Kristian

Asynchronous Advantage Actor-Critic (A3C) adalah sebuah algoritma deep reinforcement learning yang dikembangkan oleh Google DeepMind. Algoritma ini dapat digunakan untuk menciptakan sebuah arsitektur artificial intelligence yang dapat menguasai berbagai jenis game yang berbeda melalui trial and error dengan mempelajari tempilan layar game dan skor yang diperoleh dari hasil tindakannya tanpa campur tangan manusia. Sebuah network A3C terdiri dari Convolutional Neural Network (CNN) di bagian depan, Long Short-Term Memory Network (LSTM) di tengah, dan sebuah Actor-Critic network di bagian belakang. CNN berguna sebagai perangkum dari citra output layar dengan mengekstrak fitur-fitur yang penting yang terdapat pada layar. LSTM berguna sebagai pengingat keadaan game sebelumnya. Actor-Critic Network berguna untuk menentukan tindakan terbaik untuk dilakukan ketika dihadapkan dengan suatu kondisi tertentu. Dari hasil percobaan yang dilakukan, metode ini cukup efektif dan dapat mengalahkan pemain pemula dalam memainkan 5 game yang digunakan sebagai bahan uji coba.


2003 ◽  
Vol 7 (1) ◽  
pp. 29-48
Author(s):  
Riccardo Biondini ◽  
Yan-Xia Lin ◽  
Michael Mccrae

The study of long-run equilibrium processes is a significant component of economic and finance theory. The Johansen technique for identifying the existence of such long-run stationary equilibrium conditions among financial time series allows the identification of all potential linearly independent cointegrating vectors within a given system of eligible financial time series. The practical application of the technique may be restricted, however, by the pre-condition that the underlying data generating process fits a finite-order vector autoregression (VAR) model with white noise. This paper studies an alternative method for determining cointegrating relationships without such a pre-condition. The method is simple to implement through commonly available statistical packages. This ‘residual-based cointegration’ (RBC) technique uses the relationship between cointegration and univariate Box-Jenkins ARIMA models to identify cointegrating vectors through the rank of the covariance matrix of the residual processes which result from the fitting of univariate ARIMA models. The RBC approach for identifying multivariate cointegrating vectors is explained and then demonstrated through simulated examples. The RBC and Johansen techniques are then both implemented using several real-life financial time series.


Sign in / Sign up

Export Citation Format

Share Document