A Novel Approach for Stock Price Prediction Using Gradient Boosting Machine with Feature Engineering  (GBM-wFE)

Rebwar M. Nabi; Soran Ab. M. Saeed; Habibollah Harron

doi:10.24017/science.2020.1.3

A Novel Approach for Stock Price Prediction Using Gradient Boosting Machine with Feature Engineering (GBM-wFE)

Kurdistan Journal of Applied Research ◽

10.24017/science.2020.1.3 ◽

2020 ◽

Vol 5 (1) ◽

pp. 28-48

Author(s):

Rebwar M. Nabi ◽

Soran Ab. M. Saeed ◽

Habibollah Harron

Keyword(s):

Feature Selection ◽

Stock Prices ◽

Stock Price ◽

Ensemble Methods ◽

Principal Component ◽

Multiclass Classification ◽

Gradient Boosting ◽

Feature Engineering ◽

Stock Price Prediction ◽

Gradient Boosting Machine

The prediction of stock prices has become an exciting area for researchers as well as academicians due to its economic impact and potential business profits. This study proposes a novel multiclass classification ensemble learning approach for predicting stock prices based on historical data using feature engineering. The proposed approach comprises four main steps, which are pre-processing, feature selection, feature engineering, and ensemble methods. We use 11 datasets from Nasdaq and S&P 500 to ensure the accuracy of the proposed approach. Furthermore, eight feature selection algorithms are studied and implemented. More importantly, a feature engineering concept is applied to construct two new features, which are appears to be very auspicious in terms of improving classification accuracy, and this is considered the first study to use feature engineering for multiclass classification using ensemble methods. Finally, seven ensemble machine learning (ML) algorithms are used and compared to discover the ultimate collaboration prediction model. Besides, the best feature selection algorithm is proposed. This study proposes a novel multiclass classification approach called Gradient Boosting Machine with Feature Engineering (GBM-wFE) and Principal Component Analysis (PCA) as the feature selection. We find that GBM-wFE outperforms the previous studies and the overall prediction results are auspicious, as MAPE of 0.0406% is achieved, which is considered the best result compared to the available studies in the literature.

Download Full-text

FEB-Stacking and FEB-DNN Models for Stock Trend Prediction: A Performance Analysis for Pre and Post Covid-19 Periods

Decision Making Applications in Management and Engineering ◽

10.31181/dmame2104051g ◽

2021 ◽

Vol 4 (1) ◽

pp. 51-86 ◽

Cited By ~ 3

Author(s):

Indranil Ghosh ◽

◽

Tamal Datta Chaudhuri ◽

Keyword(s):

Performance Analysis ◽

Stock Price ◽

Binary Classification ◽

Class Imbalance ◽

Principal Component ◽

Classification Problem ◽

Feature Engineering ◽

Trend Prediction ◽

Neural Network Models ◽

Stock Price Prediction

In this paper, stock price prediction is perceived as a binary classification problem where the goal is to predict whether an increase or decrease in closing prices is going to be observed the next day. The framework will be of use for both investors and traders. In the aftermath of the Covid-19 pandemic, global financial markets have seen growing uncertainty and volatility and as a consequence, precise prediction of stock price trend has emerged to be extremely challenging. In this background, we propose two integrated frameworks wherein rigorous feature engineering, methodology to sort out class imbalance, and predictive modeling are clubbed together to perform stock trend prediction during normal and new normal times. A number of technical and macroeconomic indicators are chosen as explanatory variables, which are further refined through dedicated feature engineering process by applying Kernel Principal Component (KPCA) analysis. Bootstrapping procedure has been used to deal with class imbalance. Finally, two separate Artificial Intelligence models namely, Stacking and Deep Neural Network models are deployed separately on feature engineered and bootstrapped samples for estimating trends in prices of underlying stocks during pre and post Covid-19 periods. Rigorous performance analysis and comparative evaluation with other well-known models justify the effectiveness and superiority of proposed frameworks.

Download Full-text

Multiclass classification of leukemia cancer data using Fuzzy Support Vector Machine (FSVM) with feature selection using Principal Component Analysis (PCA)

Journal of Physics Conference Series ◽

10.1088/1742-6596/1725/1/012012 ◽

2021 ◽

Vol 1725 ◽

pp. 012012

Author(s):

I R Fauzi ◽

Z Rustam ◽

A Wibowo

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Feature Selection ◽

Principal Component ◽

Component Analysis ◽

Multiclass Classification ◽

Support Vector ◽

Fuzzy Support Vector Machine ◽

Cancer Data

Download Full-text

Stock Price Prediction Using Convolutional Neural Networks on a Multivariate Time Series

10.36227/techrxiv.15088734 ◽

2021 ◽

Author(s):

Sidra Mehtab ◽

Jaydip Sen

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Stock Prices ◽

Stock Price ◽

Regression Models ◽

Research Work ◽

Hybrid Approach ◽

Forecast Horizon ◽

Stock Price Prediction ◽

Price Prediction

Prediction of future movement of stock prices has been a subject matter of many research work. On one hand, we have proponents of the Efficient Market Hypothesis who claim that stock prices cannot be predicted, on the other hand, there are propositions illustrating that, if appropriately modelled, stock prices can be predicted with a high level of accuracy. There is also a gamut of literature on technical analysis of stock prices where the objective is to identify patterns in stock price movements and profit from it. In this work, we propose a hybrid approach for stock price prediction using machine learning and deep learning-based methods. We select the NIFTY 50 index values of the National Stock Exchange (NSE) of India, over a period of four years: 2015 – 2018. Based on the NIFTY data during 2015 – 2018, we build various predictive models using machine learning approaches, and then use those models to predict the “Close” value of NIFTY 50 for the year 2019, with a forecast horizon of one week, i.e., five days. For predicting the NIFTY index movement patterns, we use a number of classification methods, while for forecasting the actual “Close” values of NIFTY index, various regression models are built. We, then, augment our predictive power of the models by building a deep learning-based regression model using Convolutional Neural Network (CNN) with a walk-forward validation. The CNN model is fine-tuned for its parameters so that the validation loss stabilizes with increasing number of iterations, and the training and validation accuracies converge. We exploit the power of CNN in forecasting the future NIFTY index values using three approaches which differ in number of variables used in forecasting, number of sub-models used in the overall models and, size of the input data for training the models. Extensive results are presented on various metrics for all classification and regression models. The results clearly indicate that CNN-based multivariate forecasting model is the most effective and accurate in predicting the movement of NIFTY index values with a weekly forecast horizon.

Download Full-text

Predictability of Stock Price Fluctuations Based on Business Relationships: A Comparison of Normal and the COVID-19 Pandemic Periods in Japan

Sustainability ◽

10.3390/su131810146 ◽

2021 ◽

Vol 13 (18) ◽

pp. 10146

Author(s):

Shoma Sakamoto ◽

Shintaro Sengoku

Keyword(s):

Stock Prices ◽

Stock Price ◽

Factor Model ◽

Customer Relationships ◽

Business Relationships ◽

Monthly Basis ◽

Limited Region ◽

Stock Price Prediction ◽

Available Information ◽

The Impact

The stock prices of a company are significantly influenced by changes of its business relationships. However, the effectiveness of stock price prediction based on such inter-firm business relationships has been partially confirmed in limited region and/or timeframe cases. In particular, it has not been verified under highly volatile market conditions such as those caused by the COVID-19 pandemic. To address these issues, we analyzed the impact of supplier–customer relationships on stock prices in the case of the Japanese stock market using The Fama-French three-factor model and publicly available information of business relationships. The subjects were classified into two conditions—normal and COVID-19—and the stock price predictability associated with changes of stock prices of related companies for both short and long holding periods. As a result, the significance of stock price predictability was confirmed on a daily and monthly basis in the given region. In addition, specific factors including a volatile event caused by a customer company, a stock price downturn, and the company size of a customer particularly improved stock price predictability in the pandemic.

Download Full-text

Comparison of ARIMA, ANN and LSTM for Stock Price Prediction

E3S Web of Conferences ◽

10.1051/e3sconf/202021801026 ◽

2020 ◽

Vol 218 ◽

pp. 01026

Author(s):

Qihang Ma

Keyword(s):

Stock Prices ◽

Stock Price ◽

Short Term Memory ◽

Moving Average ◽

Arima Model ◽

Predictive Ability ◽

Research Direction ◽

Ann Model ◽

Advantages And Disadvantages ◽

Stock Price Prediction

The prediction of stock prices has always been a hot topic of research. However, the autoregressive integrated moving average (ARIMA) model commonly used and artificial neural networks (ANN) still have their own advantages and disadvantages. The use of long short-term memory (LSTM) networks model for prediction also shows interesting possibilities. This article compares three models specifically through the analysis of the principles of the three models and the prediction results. In the end, it is believed that the LSTM model may have the best predictive ability, but it is greatly affected by the data processing. The ANN model performs better than that of the ARIMA model. The combination of time series and external factors may be a worthy research direction.

Download Full-text

Stock Price Prediction Using Deep Learning Models

10.36227/techrxiv.16640197 ◽

2021 ◽

Author(s):

Jaydip Sen ◽

Sidra Mehtab ◽

Gourab Nath

Keyword(s):

Deep Learning ◽

Stock Prices ◽

Stock Price ◽

Regression Models ◽

Research Work ◽

The Other ◽

Learning Models ◽

Stock Price Prediction ◽

Price Prediction ◽

Other Hand

Prediction of future movement of stock prices has been a subject matter of many research work. On one hand, we have proponents of the Efficient Market Hypothesis who claim that stock prices cannot be predicted, on the other hand, there are propositions illustrating that, if appropriately modeled, stock prices can be predicted with a high level of accuracy. There is also a gamut of literature on technical analysis of stock prices where the objective is to identify patterns in stock price movements and profit from it. In this work, we propose a hybrid approach for stock price prediction using five deep learning-based regression models. We select the NIFTY 50 index values of the National Stock Exchange (NSE) of India, over a period of December 29, 2014 to July 31, 2020. Based on the NIFTY data during December 29, 2014 to December 28, 2018, we build two regression models using convolutional neural networks (CNNs), and three regression models using long-and-short-term memory (LSTM) networks for predicting the open values of the NIFTY 50 index records for the period December 31, 2018 to July 31, 2020. We adopted a multi-step prediction technique with walk-forward validation. The parameters of the five deep learning models are optimized using the grid-search technique so that the validation losses of the models stabilize with an increasing number of epochs in the model training, and the training and validation accuracies converge. Extensive results are presented on various metrics for all the proposed regression models. The results indicate that while both CNN and LSTM-based regression models are very accurate in forecasting the NIFTY 50 open values, the CNN model that previous one week’s data as the input is the fastest in its execution. On the other hand, the encoder-decoder convolutional LSTM model uses the previous two weeks’ data as the input is found to be the most accurate in its forecasting results.

Download Full-text

Evaluating Variable Selection and Machine Learning Algorithms for Estimating Forest Heights by Combining Lidar and Hyperspectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090507 ◽

2020 ◽

Vol 9 (9) ◽

pp. 507

Author(s):

Sanjiwana Arjasakusuma ◽

Sandiaga Swahyu Kusuma ◽

Stuart Phinn

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Principal Component ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forest Height ◽

Extreme Gradient Boosting

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.

Download Full-text

Kernel principal component analysis and support vector machines for stock price prediction

IIE Transactions ◽

10.1080/07408170600897486 ◽

2007 ◽

Vol 39 (6) ◽

pp. 629-637 ◽

Cited By ~ 30

Author(s):

Huseyin Ince ◽

Theodore B. Trafalis

Keyword(s):

Principal Component Analysis ◽

Support Vector Machines ◽

Stock Price ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Support Vector ◽

Stock Price Prediction ◽

Price Prediction ◽

Vector Machines

Download Full-text

Stock Price Prediction Using Machine Learning and LSTM-Based Deep Learning Models

10.36227/techrxiv.15103602 ◽

2021 ◽

Author(s):

Jaydip Sen ◽

Sidra Mehtab ◽

Abhishek Dutta

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Stock Prices ◽

Stock Price ◽

Regression Models ◽

Short Term Memory ◽

Efficient Market Hypothesis ◽

Training Data ◽

Stock Price Prediction ◽

Price Prediction

Prediction of stock prices has been an important area of research for a long time. While supporters of the efficient market hypothesis believe that it is impossible to predict stock prices accurately, there are formal propositions demonstrating that accurate modeling and designing of appropriate variables may lead to models using which stock prices and stock price movement patterns can be very accurately predicted. Researchers have also worked on technical analysis of stocks with a goal of identifying patterns in the stock price movements using advanced data mining techniques. In this work, we propose an approach of hybrid modeling for stock price prediction building different machine learning and deep learning-based models. For the purpose of our study, we have used NIFTY 50 index values of the National Stock Exchange (NSE) of India, during the period December 29, 2014 till July 31, 2020. We have built eight regression models using the training data that consisted of NIFTY 50 index records from December 29, 2014 till December 28, 2018. Using these regression models, we predicted the open values of NIFTY 50 for the period December 31, 2018 till July 31, 2020. We, then, augment the predictive power of our forecasting framework by building four deep learning-based regression models using long-and short-term memory (LSTM) networks with a novel approach of walk-forward validation. Using the grid-searching technique, the hyperparameters of the LSTM models are optimized so that it is ensured that validation losses stabilize with the increasing number of epochs, and the convergence of the validation accuracy is achieved. We exploit the power of LSTM regression models in forecasting the future NIFTY 50 open values using four different models that differ in their architecture and in the structure of their input data. Extensive results are presented on various metrics for all the regression models. The results clearly indicate that the LSTM-based univariate model that uses one-week prior data as input for predicting the next week's open value of the NIFTY 50 time series is the most accurate model.

Download Full-text

Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d2359.0410421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 254-262

Author(s):

*Fadare Oluwaseun Gbenga ◽

Adetunmbi Adebayo Olusola ◽

(Mrs) Oyinloye Oghenerukevwe Eloho ◽

Mogaji Stephen Alaba

Keyword(s):

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Ensemble Methods ◽

Nearest Neighbors ◽

Selection Method ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Chi Square ◽

Extreme Gradient Boosting

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.

Download Full-text