scholarly journals An Ensemble of Random Forest Gradient Boosting Machine and Deep Learning Methods for Stock Price Prediction

2022 ◽  
Vol 15 (1) ◽  
pp. 1-19
Author(s):  
Ravinder Kumar ◽  
Lokesh Kumar Shrivastav

Stochastic time series analysis of high-frequency stock market data is a very challenging task for the analysts due to the lack availability of efficient tool and techniques for big data analytics. This has opened the door of opportunities for the developer and researcher to develop intelligent and machine learning based tools and techniques for data analytics. This paper proposed an ensemble for stock market data prediction using three most prominent machine learning based techniques. The stock market dataset with raw data size of 39364 KB with all attributes and processed data size of 11826 KB having 872435 instances. The proposed work implements an ensemble model comprises of Deep Learning, Gradient Boosting Machine (GBM) and distributed Random Forest techniques of data analytics. The performance results of the ensemble model are compared with each of the individual methods i.e. deep learning, Gradient Boosting Machine (GBM) and Random Forest. The ensemble model performs better and achieves the highest accuracy of 0.99 and lowest error (RMSE) of 0.1.

An effective representation by machine learning algorithms is to obtain the results especially in Big Data, there are numerous applications can produce outcome, whereas a Random Forest Algorithm (RF) Gradient Boosting Machine (GBM), Decision tree (DT) in Python will able to give the higher accuracy in regard with classifying various parameters of Airliner Passengers satisfactory levels. The complex information of airline passengers has provided huge data for interpretation through different parameters of satisfaction that contains large information in quantity wise. An algorithm has to support in classifying these data’s with accuracies. As a result some of the methods may provide less precision and there is an opportunity of information cancellation and furthermore information missing utilizing conventional techniques. Subsequently RF and GBM used to conquer the unpredictability and exactness about the information provided. The aim of this study is to identify an Algorithm which is suitable for classifying the satisfactory level of airline passengers with data analytics using python by knowing the output. The optimization and Implementation of independent variables by training and testing for accuracy in python platform determined the variation between the each parameters and also recognized RF and GBM as a better algorithm in comparison with other classifying algorithms.


2021 ◽  
Vol 7 ◽  
pp. e476
Author(s):  
Pooja Mehta ◽  
Sharnil Pandya ◽  
Ketan Kotecha

Information gathering has become an integral part of assessing people’s behaviors and actions. The Internet is used as an online learning site for sharing and exchanging ideas. People can actively give their reviews and recommendations for variety of products and services using popular social sites and personal blogs. Social networking sites, including Twitter, Facebook, and Google+, are examples of the sites used to share opinion. The stock market (SM) is an essential area of the economy and plays a significant role in trade and industry development. Predicting SM movements is a well-known and area of interest to researchers. Social networking perfectly reflects the public’s views of current affairs. Financial news stories are thought to have an impact on the return of stock trend prices and many data mining techniques are used address fluctuations in the SM. Machine learning can provide a more accurate and robust approach to handle SM-related predictions. We sought to identify how movements in a company’s stock prices correlate with the expressed opinions (sentiments) of the public about that company. We designed and implemented a stock price prediction accuracy tool considering public sentiment apart from other parameters. The proposed algorithm considers public sentiment, opinions, news and historical stock prices to forecast future stock prices. Our experiments were performed using machine-learning and deep-learning methods including Support Vector Machine, MNB classifier, linear regression, Naïve Bayes and Long Short-Term Memory. Our results validate the success of the proposed methodology.


Author(s):  
Yanju Zhang ◽  
Sha Yu ◽  
Ruopeng Xie ◽  
Jiahui Li ◽  
André Leier ◽  
...  

Abstract Motivation Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, ‘non-classical’ secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of ‘non-classical’ secreted proteins from sequence data. Results In this work, we first constructed a high-quality dataset of experimentally verified ‘non-classical’ secreted proteins, which we then used to create benchmark datasets. Using these benchmark datasets, we comprehensively analyzed a wide range of features and assessed their individual performance. Subsequently, we developed a two-layer Light Gradient Boosting Machine (LightGBM) ensemble model that integrates several single feature-based models into an overall prediction framework. At this stage, LightGBM, a gradient boosting machine, was used as a machine learning approach and the necessary parameter optimization was performed by a particle swarm optimization strategy. All single feature-based LightGBM models were then integrated into a unified ensemble model to further improve the predictive performance. Consequently, the final ensemble model achieved a superior performance with an accuracy of 0.900, an F-value of 0.903, Matthew’s correlation coefficient of 0.803 and an area under the curve value of 0.963, and outperforming previous state-of-the-art predictors on the independent test. Based on our proposed optimal ensemble model, we further developed an accessible online predictor, PeNGaRoo, to serve users’ demands. We believe this online web server, together with our proposed methodology, will expedite the discovery of non-classically secreted effector proteins in Gram-positive bacteria and further inspire the development of next-generation predictors. Availability and implementation http://pengaroo.erc.monash.edu/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 48 (4) ◽  
pp. 2316-2327
Author(s):  
Caner KOC ◽  
Dilara GERDAN ◽  
Maksut B. EMİNOĞLU ◽  
Uğur YEGÜL ◽  
Bulent KOC ◽  
...  

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dong Jin Park ◽  
Min Woo Park ◽  
Homin Lee ◽  
Young-Jin Kim ◽  
Yeongsic Kim ◽  
...  

AbstractThe use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.


2021 ◽  
pp. bjophthalmol-2020-318609
Author(s):  
Wei Wang ◽  
Xiaotong Han ◽  
Jiaqing Zhang ◽  
Xianwen Shang ◽  
Jason Ha ◽  
...  

Background/aimsTo investigate the feasibility and accuracy of using machine learning (ML) techniques on self-reported questionnaire data to predict the 10-year risk of cataract surgery, and to identify meaningful predictors of cataract surgery in middle-aged and older Australians.MethodsBaseline information regarding demographic, socioeconomic, medical history and family history, lifestyle, dietary and self-rated health status were collected as risk factors. Cataract surgery events were confirmed by the Medicare Benefits Schedule Claims dataset. Three ML algorithms (random forests [RF], gradient boosting machine and deep learning) and one traditional regression algorithm (logistic model) were compared on the accuracy of their predictions for the risk of cataract surgery. The performance was assessed using 10-fold cross-validation. The main outcome measures were areas under the receiver operating characteristic curves (AUCs).ResultsIn total, 207 573 participants, aged 45 years and above without a history of cataract surgery at baseline, were recruited from the 45 and Up Study. The performance of gradient boosting machine (AUC 0.790, 95% CI 0.785 to 0.795), RF (AUC 0.785, 95% CI 0.780 to 0.790) and deep learning (AUC 0.781, 95% CI 0.775 to 61 0.786) were robust and outperformed the traditional logistic regression method (AUC 0.767, 95% CI 0.762 to 0.773, all p<0.05). Age, self-rated eye vision and health insurance were consistently identified as important predictors in all models.ConclusionsThe study demonstrated that ML modelling was able to reasonably accurately predict the 10-year risk of cataract surgery based on questionnaire data alone and was marginally superior to the conventional logistic model.


Sign in / Sign up

Export Citation Format

Share Document