An Ensemble of Random Forest Gradient Boosting Machine and Deep Learning Methods for Stock Price Prediction

Ravinder Kumar; Lokesh Kumar Shrivastav

doi:10.4018/jitr.2022010102

An Ensemble of Random Forest Gradient Boosting Machine and Deep Learning Methods for Stock Price Prediction

Journal of Information Technology Research ◽

10.4018/jitr.2022010102 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-19

Author(s):

Ravinder Kumar ◽

Lokesh Kumar Shrivastav

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Stock Market ◽

Data Analytics ◽

Gradient Boosting ◽

Ensemble Model ◽

Market Data ◽

Stock Price Prediction ◽

Gradient Boosting Machine

Stochastic time series analysis of high-frequency stock market data is a very challenging task for the analysts due to the lack availability of efficient tool and techniques for big data analytics. This has opened the door of opportunities for the developer and researcher to develop intelligent and machine learning based tools and techniques for data analytics. This paper proposed an ensemble for stock market data prediction using three most prominent machine learning based techniques. The stock market dataset with raw data size of 39364 KB with all attributes and processed data size of 11826 KB having 872435 instances. The proposed work implements an ensemble model comprises of Deep Learning, Gradient Boosting Machine (GBM) and distributed Random Forest techniques of data analytics. The performance results of the ensemble model are compared with each of the individual methods i.e. deep learning, Gradient Boosting Machine (GBM) and Random Forest. The ensemble model performs better and achieves the highest accuracy of 0.99 and lowest error (RMSE) of 0.1.

Download Full-text

Data Analytics for Monitoring the Satisfactory Parameters of Airline Passengers using Machine Learning Algorithms in Python

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8677.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1231-1235

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Complex Information ◽

Huge Data ◽

Gradient Boosting Machine ◽

Airline Passengers ◽

Effective Representation

An effective representation by machine learning algorithms is to obtain the results especially in Big Data, there are numerous applications can produce outcome, whereas a Random Forest Algorithm (RF) Gradient Boosting Machine (GBM), Decision tree (DT) in Python will able to give the higher accuracy in regard with classifying various parameters of Airliner Passengers satisfactory levels. The complex information of airline passengers has provided huge data for interpretation through different parameters of satisfaction that contains large information in quantity wise. An algorithm has to support in classifying these data’s with accuracies. As a result some of the methods may provide less precision and there is an opportunity of information cancellation and furthermore information missing utilizing conventional techniques. Subsequently RF and GBM used to conquer the unpredictability and exactness about the information provided. The aim of this study is to identify an Algorithm which is suitable for classifying the satisfactory level of airline passengers with data analytics using python by knowing the output. The optimization and Implementation of independent variables by training and testing for accuracy in python platform determined the variation between the each parameters and also recognized RF and GBM as a better algorithm in comparison with other classifying algorithms.

Download Full-text

Software reuse analytics using integrated random forest and gradient boosting machine learning algorithm

Software Practice and Experience ◽

10.1002/spe.2921 ◽

2020 ◽

Author(s):

Amandeep Kaur Sandhu ◽

Ranbir Singh Batth

Keyword(s):

Machine Learning ◽

Random Forest ◽

Software Reuse ◽

Learning Algorithm ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

Gradient Boosting Machine

Download Full-text

Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques

Journal Of Big Data ◽

10.1186/s40537-020-00345-2 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Samiul Islam ◽

Saman Hassanzadeh Amin

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Random Forest ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques ◽

Gradient Boosting Machine

Download Full-text

Harvesting social media sentiment analysis to enhance stock market prediction using deep learning

PeerJ Computer Science ◽

10.7717/peerj-cs.476 ◽

2021 ◽

Vol 7 ◽

pp. e476

Author(s):

Pooja Mehta ◽

Sharnil Pandya ◽

Ketan Kotecha

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Stock Market ◽

Social Networking ◽

Stock Prices ◽

Short Term Memory ◽

Support Vector ◽

Stock Price Prediction ◽

Public Sentiment ◽

Area Of Interest

Information gathering has become an integral part of assessing people’s behaviors and actions. The Internet is used as an online learning site for sharing and exchanging ideas. People can actively give their reviews and recommendations for variety of products and services using popular social sites and personal blogs. Social networking sites, including Twitter, Facebook, and Google+, are examples of the sites used to share opinion. The stock market (SM) is an essential area of the economy and plays a significant role in trade and industry development. Predicting SM movements is a well-known and area of interest to researchers. Social networking perfectly reflects the public’s views of current affairs. Financial news stories are thought to have an impact on the return of stock trend prices and many data mining techniques are used address fluctuations in the SM. Machine learning can provide a more accurate and robust approach to handle SM-related predictions. We sought to identify how movements in a company’s stock prices correlate with the expressed opinions (sentiments) of the public about that company. We designed and implemented a stock price prediction accuracy tool considering public sentiment apart from other parameters. The proposed algorithm considers public sentiment, opinions, news and historical stock prices to forecast future stock prices. Our experiments were performed using machine-learning and deep-learning methods including Support Vector Machine, MNB classifier, linear regression, Naïve Bayes and Long Short-Term Memory. Our results validate the success of the proposed methodology.

Download Full-text

PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins

Bioinformatics ◽

10.1093/bioinformatics/btz629 ◽

2019 ◽

Cited By ~ 3

Author(s):

Yanju Zhang ◽

Sha Yu ◽

Ruopeng Xie ◽

Jiahui Li ◽

André Leier ◽

...

Keyword(s):

Machine Learning ◽

Secreted Proteins ◽

Gradient Boosting ◽

Ensemble Model ◽

Gram Positive ◽

Gram Positive Bacteria ◽

Single Feature ◽

Gradient Boosting Machine ◽

Benchmark Datasets ◽

Feature Based

Abstract Motivation Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, ‘non-classical’ secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of ‘non-classical’ secreted proteins from sequence data. Results In this work, we first constructed a high-quality dataset of experimentally verified ‘non-classical’ secreted proteins, which we then used to create benchmark datasets. Using these benchmark datasets, we comprehensively analyzed a wide range of features and assessed their individual performance. Subsequently, we developed a two-layer Light Gradient Boosting Machine (LightGBM) ensemble model that integrates several single feature-based models into an overall prediction framework. At this stage, LightGBM, a gradient boosting machine, was used as a machine learning approach and the necessary parameter optimization was performed by a particle swarm optimization strategy. All single feature-based LightGBM models were then integrated into a unified ensemble model to further improve the predictive performance. Consequently, the final ensemble model achieved a superior performance with an accuracy of 0.900, an F-value of 0.903, Matthew’s correlation coefficient of 0.803 and an area under the curve value of 0.963, and outperforming previous state-of-the-art predictors on the independent test. Based on our proposed optimal ensemble model, we further developed an accessible online predictor, PeNGaRoo, to serve users’ demands. We believe this online web server, together with our proposed methodology, will expedite the discovery of non-classically secreted effector proteins in Gram-positive bacteria and further inspire the development of next-generation predictors. Availability and implementation http://pengaroo.erc.monash.edu/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Classification of hazelnut cultivars: comparison of DL4J and ensemble learning algorithms

Notulae Botanicae Horti Agrobotanici Cluj-Napoca ◽

10.15835/nbha48412041 ◽

2020 ◽

Vol 48 (4) ◽

pp. 2316-2327

Author(s):

Caner KOC ◽

Dilara GERDAN ◽

Maksut B. EMİNOĞLU ◽

Uğur YEGÜL ◽

Bulent KOC ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ensemble Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Criteria ◽

Gradient Boosting ◽

Data Set

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.

Download Full-text

Development of machine learning model for diagnostic disease prediction based on laboratory tests

Scientific Reports ◽

10.1038/s41598-021-87171-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dong Jin Park ◽

Min Woo Park ◽

Homin Lee ◽

Young-Jin Kim ◽

Yeongsic Kim ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Laboratory Tests ◽

Gradient Boosting ◽

Disease Prediction ◽

Ensemble Model ◽

Test Results ◽

Laboratory Test Results ◽

Classification Of Diseases

AbstractThe use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.

Download Full-text

Predicting the 10-year risk of cataract surgery using machine learning techniques on questionnaire data: findings from the 45 and Up Study

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2020-318609 ◽

2021 ◽

pp. bjophthalmol-2020-318609

Author(s):

Wei Wang ◽

Xiaotong Han ◽

Jiaqing Zhang ◽

Xianwen Shang ◽

Jason Ha ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Cataract Surgery ◽

Logistic Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Questionnaire Data ◽

Gradient Boosting Machine ◽

Logistic Regression Method ◽

Baseline Information

Background/aimsTo investigate the feasibility and accuracy of using machine learning (ML) techniques on self-reported questionnaire data to predict the 10-year risk of cataract surgery, and to identify meaningful predictors of cataract surgery in middle-aged and older Australians.MethodsBaseline information regarding demographic, socioeconomic, medical history and family history, lifestyle, dietary and self-rated health status were collected as risk factors. Cataract surgery events were confirmed by the Medicare Benefits Schedule Claims dataset. Three ML algorithms (random forests [RF], gradient boosting machine and deep learning) and one traditional regression algorithm (logistic model) were compared on the accuracy of their predictions for the risk of cataract surgery. The performance was assessed using 10-fold cross-validation. The main outcome measures were areas under the receiver operating characteristic curves (AUCs).ResultsIn total, 207 573 participants, aged 45 years and above without a history of cataract surgery at baseline, were recruited from the 45 and Up Study. The performance of gradient boosting machine (AUC 0.790, 95% CI 0.785 to 0.795), RF (AUC 0.785, 95% CI 0.780 to 0.790) and deep learning (AUC 0.781, 95% CI 0.775 to 61 0.786) were robust and outperformed the traditional logistic regression method (AUC 0.767, 95% CI 0.762 to 0.773, all p<0.05). Age, self-rated eye vision and health insurance were consistently identified as important predictors in all models.ConclusionsThe study demonstrated that ML modelling was able to reasonably accurately predict the 10-year risk of cataract surgery based on questionnaire data alone and was marginally superior to the conventional logistic model.

Download Full-text

Investigating the use of random forest, gradient boosting machine, support vector machine and their ensemble applied to fault detection

10.26678/abcm.cobem2017.cob17-1600 ◽

2017 ◽

Author(s):

Luis Felipe Nogoseke ◽

Gabriel Herman Bernardim Andrade ◽

Marco Boaretto ◽

Leandro Coelho

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Fault Detection ◽

Gradient Boosting ◽

Support Vector ◽

Gradient Boosting Machine

Download Full-text

An Ensemble Model for Short-Term Wind Power Forecasting using Deep Learning and Gradient Boosting Algorithms

2020 21st National Power Systems Conference (NPSC) ◽

10.1109/npsc49263.2020.9331902 ◽

2020 ◽

Author(s):

Devesh Kumar ◽

Rishabh Abhinav ◽

Naran Pindoriya

Keyword(s):

Deep Learning ◽

Wind Power ◽

Gradient Boosting ◽

Ensemble Model ◽

Short Term ◽

Wind Power Forecasting ◽

Boosting Algorithms ◽

Power Forecasting

Download Full-text