Return Prediction Based on Discriminating market-styles with Reinforcement Learning

For hedge funds, return prediction has always been a fundamental and important problem. Usually, a good return prediction model directly determines the performance of a quantitative investment strategy. However, the performance of the model will be influenced by the market-style. Even the models trained through the same data set, their performance is different in different market-styles. Traditional methods hope to train a universal linear or nonlinear model on the data set to cope with different market-styles. However, the linear model has limited fitting ability and is insufficient to deal with hundreds of features in the hedge fund features pool. The nonlinear model has a risk to be over-fitting. Simultaneously, changes in market-style will make certain features valid or invalid, and a traditional linear or nonlinear model is not sufficient to deal with this situation. This thesis proposes a method based on Reinforcement Learning that automatically discriminates market-styles and automatically selects the model that best ﬁts the current market-style from sub-models pre-trained with different categories of features to predict the return of stocks. Compared with the traditional method that training return prediction model directly through the full data sets, the experiment shows that the proposed method has a better performance, which has a higher Sharpe ratio and annualized return.

Download Full-text

On The Persistence Of Selectivity And Market Timing Skills In Hedge Funds

International Business & Economics Research Journal (IBER) ◽

10.19030/iber.v12i12.8251 ◽

2013 ◽

Vol 12 (12) ◽

pp. 1575

Author(s):

John Muteba Mwamba

Keyword(s):

Hedge Funds ◽

Market Timing ◽

Hedge Fund ◽

Efficient Market Hypothesis ◽

Investment Strategy ◽

Chi Square ◽

Cross Sectional ◽

Data Set ◽

Fund Managers ◽

Limited Arbitrage

This paper investigates the persistence of hedge fund managers skills during periods of boom and/or recession. We consider a data set of monthly investment strategy indices published by Hedge Fund Research group. The data set spans from January 1995 to June 2010. We divide this sample period into four overlapping sub-sample periods that contain different economic cycles. We define a skilled manager as a manager who can outperform the market consistently during two consecutive sub-sample periods. We first estimate outperformance, selectivity and market timing skills using both linear and quadratic Capital Asset Pricing Model-CAPM. Persistence in performance is carried out in three different fashions: contingence table, chi-square test and cross-sectional auto-regression technique. The results show that fund managers have the skills to outperform the market during periods of positive economic growth only. This market outperformance is due to both selectivity and market timing skills. These results contradict the Efficient Market Hypothesis-EMH due to limited arbitrage opportunity.

Download Full-text

HEDGE FUNDS: RISK AND PERFORMANCE

Journal of Financial Management Markets and Institutions ◽

10.1142/s2591768418500034 ◽

2018 ◽

Vol 06 (01) ◽

pp. 1850003

Author(s):

SANGHEON SHIN ◽

JAN SMOLARSKI ◽

GÖKÇE SOYDEMIR

Keyword(s):

Risk Factors ◽

Hedge Funds ◽

Factor Model ◽

Explanatory Power ◽

Hedge Fund ◽

Investment Strategy ◽

Time Varying ◽

Asset Class ◽

And Performance ◽

Fund Portfolio

This paper models hedge fund exposure to risk factors and examines time-varying performance of hedge funds. From existing models such as asset-based style (ABS)-factor model, standard asset class (SAC)-factor model, and four-factor model, we extract the best six factors for each hedge fund portfolio by investment strategy. Then, we find combinations of risk factors that explain most of the variance in performance of each hedge fund portfolio based on investment strategy. The results show instability of coefficients in the performance attribution regression. Incorporating a time-varying factor exposure feature would be the best way to measure hedge fund performance. Furthermore, the optimal models with fewer factors exhibit greater explanatory power than existing models. Using rolling regressions, our customized investment strategy model shows how hedge funds are sensitive to risk factors according to market conditions.

Download Full-text

Long-Term Rainfall Forecast Model Based on The TabNet and LightGbm Algorithm

10.21203/rs.3.rs-107107/v1 ◽

2020 ◽

Author(s):

Tianyu Xu ◽

Yongchuan Yu ◽

Jianzhuo Yan ◽

Hongxia Xu

Keyword(s):

Prediction Model ◽

Feature Fusion ◽

Forecast Model ◽

Data Sets ◽

Good Prediction ◽

Data Set ◽

Rainfall Prediction ◽

Improve Model ◽

Probability Prediction

Abstract Due to the problems of unbalanced data sets and distribution differences in long-term rainfall prediction, the current rainfall prediction model had poor generalization performance and could not achieve good prediction results in real scenarios. This study uses multiple atmospheric parameters (such as temperature, humidity, atmospheric pressure, etc.) to establish a TabNet-LightGbm rainfall probability prediction model. This research uses feature engineering (such as generating descriptive statistical features, feature fusion) to improve model accuracy, Borderline Smote algorithm to improve data set imbalance, and confrontation verification to improve distribution differences. The experiment uses 5 years of precipitation data from 26 stations in the Beijing-Tianjin-Hebei region of China to verify the proposed rainfall prediction model. The test set is to predict the rainfall of each station in one month. The experimental results shows that the model has good performance with AUC larger than 92%. The method proposed in this study further improves the accuracy of rainfall prediction, and provides a reference for data mining tasks.

Download Full-text

Dr Foster global frailty score: an international retrospective observational study developing and validating a risk prediction model for hospitalised older persons from administrative data sets

BMJ Open ◽

10.1136/bmjopen-2018-026759 ◽

2019 ◽

Vol 9 (6) ◽

pp. e026759 ◽

Cited By ~ 2

Author(s):

John T Y Soong ◽

Jurgita Kaubryte ◽

Danny Liew ◽

Carol Jane Peden ◽

Alex Bottle ◽

...

Keyword(s):

Prediction Model ◽

Risk Prediction ◽

Administrative Data ◽

Older Persons ◽

Cohort Analysis ◽

Risk Prediction Model ◽

Data Sets ◽

Predictive Capacity ◽

Data Set ◽

Frailty Score

ObjectivesThis study aimed to examine the prevalence of frailty coding within the Dr Foster Global Comparators (GC) international database. We then aimed to develop and validate a risk prediction model, based on frailty syndromes, for key outcomes using the GC data set.DesignA retrospective cohort analysis of data from patients over 75 years of age from the GC international administrative data. A risk prediction model was developed from the initial analysis based on seven frailty syndrome groups and their relationship to outcome metrics. A weighting was then created for each syndrome group and summated to create the Dr Foster Global Frailty Score. Performance of the score for predictive capacity was compared with an established prognostic comorbidity model (Elixhauser) and tested on another administrative database Hospital Episode Statistics (2011-2015), for external validation.Setting34 hospitals from nine countries across Europe, Australia, the UK and USA.ResultsOf 6.7 million patient records in the GC database, 1.4 million (20%) were from patients aged 75 years or more. There was marked variation in coding of frailty syndromes between countries and hospitals. Frailty syndromes were coded in 2% to 24% of patient spells. Falls and fractures was the most common syndrome coded (24%). The Dr Foster Global Frailty Score was significantly associated with in-hospital mortality, 30-day non-elective readmission and long length of hospital stay. The score had significant predictive capacity beyond that of other known predictors of poor outcome in older persons, such as comorbidity and chronological age. The score’s predictive capacity was higher in the elective group compared with non-elective, and may reflect improved performance in lower acuity states.ConclusionsFrailty syndromes can be coded in international secondary care administrative data sets. The Dr Foster Global Frailty Score significantly predicts key outcomes. This methodology may be feasibly utilised for case-mix adjustment for older persons internationally.

Download Full-text

An Accurate Substitution Method To Minimize Left Censoring Bias in Serum Steroid Measurements

Endocrinology ◽

10.1210/en.2019-00340 ◽

2019 ◽

Vol 160 (10) ◽

pp. 2395-2400 ◽

Cited By ~ 5

Author(s):

David J Handelsman ◽

Lam P Ly

Keyword(s):

Data Analysis ◽

Ad Hoc ◽

Likelihood Estimation ◽

Large Data ◽

Serum Testosterone ◽

Accurate Method ◽

Estimation Methods ◽

Data Sets ◽

Full Data ◽

Data Set

Abstract Hormone assay results below the assay detection limit (DL) can introduce bias into quantitative analysis. Although complex maximum likelihood estimation methods exist, they are not widely used, whereas simple substitution methods are often used ad hoc to replace the undetectable (UD) results with numeric values to facilitate data analysis with the full data set. However, the bias of substitution methods for steroid measurements is not reported. Using a large data set (n = 2896) of serum testosterone (T), DHT, estradiol (E2) concentrations from healthy men, we created modified data sets with increasing proportions of UD samples (≤40%) to which we applied five different substitution methods (deleting UD samples as missing and substituting UD sample with DL, DL/√2, DL/2, or 0) to calculate univariate descriptive statistics (mean, SD) or bivariate correlations. For all three steroids and for univariate as well as bivariate statistics, bias increased progressively with increasing proportion of UD samples. Bias was worst when UD samples were deleted or substituted with 0 and least when UD samples were substituted with DL/√2, whereas the other methods (DL or DL/2) displayed intermediate bias. Similar findings were replicated in randomly drawn small subsets of 25, 50, and 100. Hence, we propose that in steroid hormone data with ≤40% UD samples, substituting UD with DL/√2 is a simple, versatile, and reasonably accurate method to minimize left censoring bias, allowing for data analysis with the full data set.

Download Full-text

Learning to rank with click-through features in a reinforcement learning framework

International Journal of Web Information Systems ◽

10.1108/ijwis-12-2015-0046 ◽

2016 ◽

Vol 12 (4) ◽

pp. 448-476 ◽

Cited By ~ 2

Author(s):

Amir Hosein Keyhanipour ◽

Behzad Moshiri ◽

Maryam Piroozmand ◽

Farhad Oroumchian ◽

Ali Moeini

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Training Data ◽

High Dimensionality ◽

Compact Representation ◽

Second Phase ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Benchmark Data

Purpose Learning to rank algorithms inherently faces many challenges. The most important challenges could be listed as high-dimensionality of the training data, the dynamic nature of Web information resources and lack of click-through data. High dimensionality of the training data affects effectiveness and efficiency of learning algorithms. Besides, most of learning to rank benchmark datasets do not include click-through data as a very rich source of information about the search behavior of users while dealing with the ranked lists of search results. To deal with these limitations, this paper aims to introduce a novel learning to rank algorithm by using a set of complex click-through features in a reinforcement learning (RL) model. These features are calculated from the existing click-through information in the data set or even from data sets without any explicit click-through information. Design/methodology/approach The proposed ranking algorithm (QRC-Rank) applies RL techniques on a set of calculated click-through features. QRC-Rank is as a two-steps process. In the first step, Transformation phase, a compact benchmark data set is created which contains a set of click-through features. These feature are calculated from the original click-through information available in the data set and constitute a compact representation of click-through information. To find most effective click-through feature, a number of scenarios are investigated. The second phase is Model-Generation, in which a RL model is built to rank the documents. This model is created by applying temporal difference learning methods such as Q-Learning and SARSA. Findings The proposed learning to rank method, QRC-rank, is evaluated on WCL2R and LETOR4.0 data sets. Experimental results demonstrate that QRC-Rank outperforms the state-of-the-art learning to rank methods such as SVMRank, RankBoost, ListNet and AdaRank based on the precision and normalized discount cumulative gain evaluation criteria. The use of the click-through features calculated from the training data set is a major contributor to the performance of the system. Originality/value In this paper, we have demonstrated the viability of the proposed features that provide a compact representation for the click through data in a learning to rank application. These compact click-through features are calculated from the original features of the learning to rank benchmark data set. In addition, a Markov Decision Process model is proposed for the learning to rank problem using RL, including the sets of states, actions, rewarding strategy and the transition function.

Download Full-text

Current Topics in Avian Conservation Genetics with Special Reference to the Southwestern Willow Flycatcher

The Open Ornithology Journal ◽

10.2174/1874453201609010060 ◽

2016 ◽

Vol 9 (1) ◽

pp. 60-69

Author(s):

Robert M. Zink

Keyword(s):

Analytical Methods ◽

Scientific Method ◽

Morphological Data ◽

Data Sets ◽

Avian Conservation ◽

Full Data ◽

Data Set ◽

Willow Flycatcher ◽

Existing Data ◽

Conservation Decisions

It is sometimes said that scientists are entitled to their own opinions but not their own set of facts. This suggests that application of the scientific method ought to lead to a single conclusion from a given set of data. However, sometimes scientists have conflicting opinions about which analytical methods are most appropriate or which subsets of existing data are most relevant, resulting in different conclusions. Thus, scientists might actually lay claim to different sets of facts. However, if a contrary conclusion is reached by selecting a subset of data, this conclusion should be carefully scrutinized to determine whether consideration of the full data set leads to different conclusions. This is important because conservation agencies are required to consider all of the best available data and make a decision based on them. Therefore, exploring reasons why different conclusions are reached from the same body of data has relevance for management of species. The purpose of this paper was to explore how two groups of researchers can examine the same data and reach opposite conclusions in the case of the taxonomy of the endangered subspecies Southwestern Willow Flycatcher (Empidonax traillii extimus). It was shown that use of subsets of data and characters rather than reliance on entire data sets can explain conflicting conclusions. It was recommend that agencies tasked with making conservation decisions rely on analyses that include all relevant molecular, ecological, behavioral, and morphological data, which in this case show that the subspecies is not valid, and hence its listing is likely not warranted.

Download Full-text

Activist Hedge Funds

10.1093/oso/9780190607371.003.0007 ◽

2017 ◽

Author(s):

Tony Calenda ◽

Christopher Milliken ◽

Andrew C. Spieler

Keyword(s):

Hedge Funds ◽

Hedge Fund ◽

Investment Strategy ◽

Professional Literature ◽

Public Companies ◽

Long Run ◽

Public Company ◽

Corporate Raiders ◽

Proxy Fights ◽

Policy Discussion

Activist hedge funds (AHFs), a relatively new alternative investment strategy, have had a large and growing impact on investing and on how public companies are managed. Although activist investing was once the province of corporate raiders, it is now an accepted hedge fund strategy. Often acquiring an influential stake in an undervalued public company before direct intervention, AHFs create their own catalyst for share appreciation. The actions or interventions taken by an AHF can range from direct communication with a board or management team to launching highly visible proxy fights or legal action. Through a review of academic and professional literature, this chapter offers a look into the relevant public policy discussion, implications for target companies in the short and long run, and the techniques AHFs commonly deploy.

Download Full-text

Methodology for eliminating imbalance of image data sets

Bulletin of the National Technical University KhPI A series of Information and Modeling ◽

10.20998/2411-0558.2021.02.04 ◽

2021 ◽

Vol 1 (2 (6)) ◽

Author(s):

Tatyana Biloborodova ◽

Inna Skarga-Bandurova ◽

Mark Koverga

Keyword(s):

Feature Extraction ◽

Reinforcement Learning ◽

Key Words ◽

Class Imbalance ◽

Image Data ◽

Unbalanced Data ◽

Data Sets ◽

Learning Technology ◽

Data Set ◽

Image Fragment

The methodology of solving the problem of eliminating class imbalance in image data sets is presented. The proposed methodology includes the stages of image fragment extraction, fragment augmentation, feature extraction, duplication of minority objects, and is based on reinforcement learning technology. The degree of imbalance indicator was used as a measure to determine the imbalance of the data set. An experiment was performed using a set of images of the faces of patients with skin rashes, annotated according to the severity of acne. The main steps of the methodology implementation are considered. The results of the classification showed the feasibility of applying the proposed methodology. The accuracy of classification on test data was 85%, which is 5% higher than the result obtained without the use of the proposed methodology. Key words: class imbalance, unbalanced data set, image fragment extraction, augmentation.

Download Full-text

Efficiency of HLA-A, -B, -C, -DRB1 High-Resolution Typings of Newly Recruited Potential Stem Cell Donors.

Blood ◽

10.1182/blood.v108.11.5415.5415 ◽

2006 ◽

Vol 108 (11) ◽

pp. 5415-5415 ◽

Cited By ~ 3

Author(s):

Alexander H. Schmidt ◽

Andrea Stahr ◽

Daniel Baier ◽

Gerhard Ehninger ◽

Claudia Rutt

Keyword(s):

Stem Cell ◽

High Resolution ◽

Final Decision ◽

Data Sets ◽

Donor Group ◽

Donor Center ◽

Full Data ◽

Data Set ◽

Group A ◽

Group B

Abstract In strategic stem cell donor registry planning, it is of special importance to decide how to type newly registered donors. This question refers to both the selection of HLA loci and the resolution (low, intermediate, or high) of HLA typings. In principle, high-resolution typings of all transplant-relevant loci are preferable. However, cost considerations generally lead to incomplete typings (only selected HLA loci with low or intermediate typing resolution) in practice. Here, we present results of a project in which newly recruited donors are typed for the HLA-A, -B, -C, and -DRB1 loci with high resolution by sequencing. Efficiency of these typings is measured by subsequent requests for confirmatory typings (CTs) and stem cell donations. Results for donors who were included in the project (Donor Group A) are compared to requests for donors with other, less complete typing levels: HLA-A and HLA-B at intermediate resolution, HLA-DRB1 at high resolution (Group B); HLA-A, -B, -C, and -DRB1 at intermediate resolution (Group C); HLA-A, -B, and -DRB1 at intermediate resolution (Group D). All data are taken from the donor file of DKMS German Bone Marrow Donor Center. Since the four groups differ considerably regarding their age and sex distributions, calculations are also carried through for restricted data sets that include only male donors up to age 25. Results are shown in Table 1. Donors of Groups A and B have similar CT request frequencies of 5.90 and 5.92 requests per 100 donors per year in the resctricted data sets, respectively. These frequencies significantly exceed the corresponding frequencies of the other groups with less complete typing levels. For donation requests, the frequency is signifcantly higher for Group A than for Group B (restricted data sets): 1.45 vs 1.02 requests per donor per year (p<0.05). Obviously, the additional HLA information for Group A donors leads to a higher ratio between donations and CT requests. Again, figures are much lower for Groups C and D. These results are based on a high number of requests even for the restricted data sets, namely between 44 and 90 donation requests and between 227 and 619 CT requests per group. Our results show that full (HLA-A, -B, -C, and -DRB1) high-resolution typings at donor recruitment lead to significantly higher probabilities for donation requests. Donor centers and registries should carefully take into account these higher probabilities when they consider full high-resolution typings for newly recruited donors. However, the final decision regarding the typing strategy at recruitment must also depend on the individual cost structure of a donor center or registry. The presented results are based on a donor file that consists mainly (≈99%) of Caucasian donors. It should be subject to further analyses if these results also apply to other, more heterogeneous donor pools. Table 1: Requests per 100 donors per year by donor group CT requests Donation requests Donor Group Full data set Only male donors≤ 25 Full data set Only male donors≤ 25 A 5.14 5.90 1.45 1.45 B 4.60 5.92 0.84 1.02 C 2.50 3.03 0.58 0.67 D 2.36 2.80 0.38 0.48

Download Full-text