scholarly journals Performance Evaluation of Machine Learning Algorithms for Stock Price and Stock Index Movement Prediction Using Trend Deterministic Data Prediction

2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

This experimental study addresses the problem of predicting the direction of stocks and the movement of stock price indices for three major stocks and stock indices. The proposed approach for processing input data involves the computation of ten technical indicators using stock trading data. The dataset used for the evaluation of all the prediction models consists of 11 years of historical data from January 2007 to December 2017. The study comprises four prediction models which are Long Short-Term Memory, XGBoost, Support Vector Machine ( and Random forests. Accuracy scores and F1 scores for each of the prediction models have been evaluated using this input approach. Experimental results reveal that a continuous data approach using ten technical indicators gives the best performance in the case of the Random Forest classifier model with the highest accuracy of 84.89% (average wise 83.74%) and highest F1 score of 89.33% (average wise 83.74%). The experiments also give us an insight into why a Naïve Bayes Classification model is not a suitable prediction model for the above task.

2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


2019 ◽  
Vol 8 (2) ◽  
pp. 3186-3193

The trend of stock price prediction has always been in the focal point of analytical activity in financial domain for both the researchers and investors. Prediction with accuracy is very essential for improved investment decisions that imbibe minimum risk factors. Due to this, majority of investors depend upon that intelligent trading system which generates better forecasting results. As forecasting stock market price with high accuracy is quite a challenging task for the analysts, machine learning has been adopted as one of the popular techniques to predict future trends. Even if there are many recognized analytical time series analysis that are categorized either under soft computing or under conventional statistical techniques like fuzzy logic, artificial neural networks and genetic algorithms, researchers have been looking for more appropriate techniques which can exhibit improved results. In this paper, we developed different hybrid machine learning based prediction models and compared their efficiency. Dimension reduction techniques such as orthogonal forward selection (OFS) and kernel principal component analysis (KPCA) are used separately with support vector regression (SVR) and teaching learning based optimization (TLBO) to predict the stock price of Tata Steel. The performance of both the proposed approach is evaluated with 4143days daily transactional data of Tata steels stocks price, which was collected from Bombay Stock Exchange (BSE). We compared the results of both OFS-SVR-TLBO and KPCA-SVR-TLBO hybrid models and concludes that by incorporating KPCA is more practicable and performs better results than OFS


2020 ◽  
Vol 23 (4) ◽  
pp. 274-284 ◽  
Author(s):  
Jingang Che ◽  
Lei Chen ◽  
Zi-Han Guo ◽  
Shuaiqun Wang ◽  
Aorigele

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.


2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


Author(s):  
Cheng-Chien Lai ◽  
Wei-Hsin Huang ◽  
Betty Chia-Chen Chang ◽  
Lee-Ching Hwang

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.


2021 ◽  
Vol 11 (10) ◽  
pp. 4443
Author(s):  
Rokas Štrimaitis ◽  
Pavel Stefanovič ◽  
Simona Ramanauskaitė ◽  
Asta Slotkienė

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).


2021 ◽  
pp. 016555152110065
Author(s):  
Rahma Alahmary ◽  
Hmood Al-Dossari

Sentiment analysis (SA) aims to extract users’ opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naïve Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lei Li ◽  
Desheng Wu

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.


Kybernetes ◽  
2019 ◽  
Vol 49 (9) ◽  
pp. 2335-2348 ◽  
Author(s):  
Milad Yousefi ◽  
Moslem Yousefi ◽  
Masood Fathi ◽  
Flavio S. Fogliatto

Purpose This study aims to investigate the factors affecting daily demand in an emergency department (ED) and to provide a forecasting tool in a public hospital for horizons of up to seven days. Design/methodology/approach In this study, first, the important factors to influence the demand in EDs were extracted from literature then the relevant factors to the study are selected. Then, a deep neural network is applied to constructing a reliable predictor. Findings Although many statistical approaches have been proposed for tackling this issue, better forecasts are viable by using the abilities of machine learning algorithms. Results indicate that the proposed approach outperforms statistical alternatives available in the literature such as multiple linear regression, autoregressive integrated moving average, support vector regression, generalized linear models, generalized estimating equations, seasonal ARIMA and combined ARIMA and linear regression. Research limitations/implications The authors applied this study in a single ED to forecast patient visits. Applying the same method in different EDs may give a better understanding of the performance of the model to the authors. The same approach can be applied in any other demand forecasting after some minor modifications. Originality/value To the best of the knowledge, this is the first study to propose the use of long short-term memory for constructing a predictor of the number of patient visits in EDs.


2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Xianglong Luo ◽  
Danyang Li ◽  
Yu Yang ◽  
Shengrui Zhang

The traffic flow prediction is becoming increasingly crucial in Intelligent Transportation Systems. Accurate prediction result is the precondition of traffic guidance, management, and control. To improve the prediction accuracy, a spatiotemporal traffic flow prediction method is proposed combined with k-nearest neighbor (KNN) and long short-term memory network (LSTM), which is called KNN-LSTM model in this paper. KNN is used to select mostly related neighboring stations with the test station and capture spatial features of traffic flow. LSTM is utilized to mine temporal variability of traffic flow, and a two-layer LSTM network is applied to predict traffic flow respectively in selected stations. The final prediction results are obtained by result-level fusion with rank-exponent weighting method. The prediction performance is evaluated with real-time traffic flow data provided by the Transportation Research Data Lab (TDRL) at the University of Minnesota Duluth (UMD) Data Center. Experimental results indicate that the proposed model can achieve a better performance compared with well-known prediction models including autoregressive integrated moving average (ARIMA), support vector regression (SVR), wavelet neural network (WNN), deep belief networks combined with support vector regression (DBN-SVR), and LSTM models, and the proposed model can achieve on average 12.59% accuracy improvement.


Sign in / Sign up

Export Citation Format

Share Document