Stock Selection Strategy Based on Support Vector Machine and eXtreme Gradient Boosting Methods

Author(s):  
Haoyue Liu
2017 ◽  
Vol 25 (3) ◽  
pp. 321-330 ◽  
Author(s):  
Shang Gao ◽  
Michael T Young ◽  
John X Qiu ◽  
Hong-Jun Yoon ◽  
James B Christian ◽  
...  

Abstract Objective We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufficiently capture syntactic and semantic contexts from free-text documents. Materials and Methods Data for our analyses were obtained from 942 deidentified pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classification, matched to G1–G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w/A), and a convolutional neural network. Results Our results demonstrate that for both information tasks, HAN performed significantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macroF-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w/A (0.637, 0.471), and convolutional neural network (0.714, 0.460). Conclusions HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.


2018 ◽  
Vol 10 (5) ◽  
pp. 9 ◽  
Author(s):  
Ru Zhang ◽  
Zi-ang Lin ◽  
Shaozhen Chen ◽  
Zhixuan Lin ◽  
Xingwei Liang

In recent years, the combination of machine learning method and traditional financial investment field has become a hotspot in academic and industry. This paper takes CSI 300 and CSI 500 stocks as the research objects. First, this paper carries out kernel function test and parameter optimization for the kernel support vector machine system, and then predict and optimize the combination of market-neutral stock selection strategy and stock right strategy. The results of the experiment show that the multi-factor model based on SVM has a strong predictive power for the selection of stock, and it has a difference in the predictive power of different nuclear functions.


2021 ◽  
Vol 12 (2) ◽  
pp. 28-55
Author(s):  
Fabiano Rodrigues ◽  
Francisco Aparecido Rodrigues ◽  
Thelma Valéria Rocha Rodrigues

Este estudo analisa resultados obtidos com modelos de machine learning para predição do sucesso de startups. Como proxy de sucesso considera-se a perspectiva do investidor, na qual a aquisição da startup ou realização de IPO (Initial Public Offering) são formas de recuperação do investimento. A revisão da literatura aborda startups e veículos de financiamento, estudos anteriores sobre predição do sucesso de startups via modelos de machine learning, e trade-offs entre técnicas de machine learning. Na parte empírica, foi realizada uma pesquisa quantitativa baseada em dados secundários oriundos da plataforma americana Crunchbase, com startups de 171 países. O design de pesquisa estabeleceu como filtro startups fundadas entre junho/2010 e junho/2015, e uma janela de predição entre junho/2015 e junho/2020 para prever o sucesso das startups. A amostra utilizada, após etapa de pré-processamento dos dados, foi de 18.571 startups. Foram utilizados seis modelos de classificação binária para a predição: Regressão Logística, Decision Tree, Random Forest, Extreme Gradiente Boosting, Support Vector Machine e Rede Neural. Ao final, os modelos Random Forest e Extreme Gradient Boosting apresentaram os melhores desempenhos na tarefa de classificação. Este artigo, envolvendo machine learning e startups, contribui para áreas de pesquisa híbridas ao mesclar os campos da Administração e Ciência de Dados. Além disso, contribui para investidores com uma ferramenta de mapeamento inicial de startups na busca de targets com maior probabilidade de sucesso.   


2018 ◽  
Vol 7 (5) ◽  
pp. 9
Author(s):  
Ru Zhang ◽  
Zi-ang Lin ◽  
Shaozhen Chen ◽  
Min Zhao ◽  
Mingjie Yuan

In recent years, the applications of machine learning techniques to perfect traditional financial investment models has gained a widespread attention from the academic circle and the financial industry. This paper takes CSI300 stocks as the object of the research, uses Adaboost to enhance the classification ability of original linear support vector machine, and combines all major factors to build Adaboost-SVM multi-factor stock selection model based on Adaboost enhancement. In the backtesting analysis, the stock selection strategy of original linear support vector machine was compared with the Adaboost-SVM multi-factor stock selection strategy based on Adaboost enhancement. The result shows that the Adaboost-SVM multi-factor stock selection strategy based on Adaboost enhancement possesses stronger profitability and smaller income fluctuation than the original algorithm model.


2021 ◽  
Vol 4 (2(112)) ◽  
pp. 58-72
Author(s):  
Chingiz Kenshimov ◽  
Zholdas Buribayev ◽  
Yedilkhan Amirgaliyev ◽  
Aisulyu Ataniyazova ◽  
Askhat Aitimov

In the course of our research work, the American, Russian and Turkish sign languages were analyzed. The program of recognition of the Kazakh dactylic sign language with the use of machine learning methods is implemented. A dataset of 5000 images was formed for each gesture, gesture recognition algorithms were applied, such as Random Forest, Support Vector Machine, Extreme Gradient Boosting, while two data types were combined into one database, which caused a change in the architecture of the system as a whole. The quality of the algorithms was also evaluated. The research work was carried out due to the fact that scientific work in the field of developing a system for recognizing the Kazakh language of sign dactyls is currently insufficient for a complete representation of the language. There are specific letters in the Kazakh language, because of the peculiarities of the spelling of the language, problems arise when developing recognition systems for the Kazakh sign language. The results of the work showed that the Support Vector Machine and Extreme Gradient Boosting algorithms are superior in real-time performance, but the Random Forest algorithm has high recognition accuracy. As a result, the accuracy of the classification algorithms was 98.86 % for Random Forest, 98.68 % for Support Vector Machine and 98.54 % for Extreme Gradient Boosting. Also, the evaluation of the quality of the work of classical algorithms has high indicators. The practical significance of this work lies in the fact that scientific research in the field of gesture recognition with the updated alphabet of the Kazakh language has not yet been conducted and the results of this work can be used by other researchers to conduct further research related to the recognition of the Kazakh dactyl sign language, as well as by researchers, engaged in the development of the international sign language


Sign in / Sign up

Export Citation Format

Share Document