Predicting bitcoin price movements using sentiment analysis: a machine learning approach

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ikhlaas Gurrib ◽  
Firuz Kamalov

Purpose Cryptocurrencies such as Bitcoin (BTC) attracted a lot of attention in recent months due to their unprecedented price fluctuations. This paper aims to propose a new method for predicting the direction of BTC price using linear discriminant analysis (LDA) together with sentiment analysis. Design/methodology/approach Concretely, the authors train an LDA-based classifier that uses the current BTC price information and BTC news announcements headlines to forecast the next-day direction of BTC prices. The authors compare the results with a Support Vector Machine (SVM) model and random guess approach. The use of BTC price information and news announcements related to crypto enables us to value the importance of these different sources and types of information. Findings Relative to the LDA results, the SVM model was more accurate in predicting BTC next day’s price movement. All models yielded better forecasts of an increase in tomorrow’s BTC price compared to forecasting a decrease in the crypto price. The inclusion of news sentiment resulted in the highest forecast accuracy of 0.585 on the test data, which is superior to a random guess. The LDA (SVM) model with asset specific (news sentiment and asset specific) input features ranked first within their respective model classifiers, suggesting both BTC news sentiment and asset specific are prized factors in predicting tomorrow’s price direction. Originality/value To the best of the authors’ knowledge, this is the first study to analyze the potential effect of crypto-related sentiment and BTC specific news on BTC’s price using LDA and sentiment analysis.

Kybernetes ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shilpa B L ◽  
Shambhavi B R

PurposeStock market forecasters are focusing to create a positive approach for predicting the stock price. The fundamental principle of an effective stock market prediction is not only to produce the maximum outcomes but also to reduce the unreliable stock price estimate. In the stock market, sentiment analysis enables people for making educated decisions regarding the investment in a business. Moreover, the stock analysis identifies the business of an organization or a company. In fact, the prediction of stock prices is more complex due to high volatile nature that varies a large range of investor sentiment, economic and political factors, changes in leadership and other factors. This prediction often becomes ineffective, while considering only the historical data or textural information. Attempts are made to make the prediction more precise with the news sentiment along with the stock price information.Design/methodology/approachThis paper introduces a prediction framework via sentiment analysis. Thereby, the stock data and news sentiment data are also considered. From the stock data, technical indicator-based features like moving average convergence divergence (MACD), relative strength index (RSI) and moving average (MA) are extracted. At the same time, the news data are processed to determine the sentiments by certain processes like (1) pre-processing, where keyword extraction and sentiment categorization process takes place; (2) keyword extraction, where WordNet and sentiment categorization process is done; (3) feature extraction, where Proposed holoentropy based features is extracted. (4) Classification, deep neural network is used that returns the sentiment output. To make the system more accurate on predicting the sentiment, the training of NN is carried out by self-improved whale optimization algorithm (SIWOA). Finally, optimized deep belief network (DBN) is used to predict the stock that considers the features of stock data and sentiment results from news data. Here, the weights of DBN are tuned by the new SIWOA.FindingsThe performance of the adopted scheme is computed over the existing models in terms of certain measures. The stock dataset includes two companies such as Reliance Communications and Relaxo Footwear. In addition, each company consists of three datasets (a) in daily option, set start day 1-1-2019 and end day 1-12-2020, (b) in monthly option, set start Jan 2000 and end Dec 2020 and (c) in yearly option, set year 2000. Moreover, the adopted NN + DBN + SIWOA model was computed over the traditional classifiers like LSTM, NN + RF, NN + MLP and NN + SVM; also, it was compared over the existing optimization algorithms like NN + DBN + MFO, NN + DBN + CSA, NN + DBN + WOA and NN + DBN + PSO, correspondingly. Further, the performance was calculated based on the learning percentage that ranges from 60, 70, 80 and 90 in terms of certain measures like MAE, MSE and RMSE for six datasets. On observing the graph, the MAE of the adopted NN + DBN + SIWOA model was 91.67, 80, 91.11 and 93.33% superior to the existing classifiers like LSTM, NN + RF, NN + MLP and NN + SVM, respectively for dataset 1. The proposed NN + DBN + SIWOA method holds minimum MAE value of (∼0.21) at learning percentage 80 for dataset 1; whereas, the traditional models holds the value for NN + DBN + CSA (∼1.20), NN + DBN + MFO (∼1.21), NN + DBN + PSO (∼0.23) and NN + DBN + WOA (∼0.25), respectively. From the table, it was clear that the RMSRE of the proposed NN + DBN + SIWOA model was 3.14, 1.08, 1.38 and 15.28% better than the existing classifiers like LSTM, NN + RF, NN + MLP and NN + SVM, respectively, for dataset 6. In addition, he MSE of the adopted NN + DBN + SIWOA method attain lower values (∼54944.41) for dataset 2 than other existing schemes like NN + DBN + CSA(∼9.43), NN + DBN + MFO (∼56728.68), NN + DBN + PSO (∼2.95) and NN + DBN + WOA (∼56767.88), respectively.Originality/valueThis paper has introduced a prediction framework via sentiment analysis. Thereby, along with the stock data and news sentiment data were also considered. From the stock data, technical indicator based features like MACD, RSI and MA are extracted. Therefore, the proposed work was said to be much appropriate for stock market prediction.


2020 ◽  
Vol 92 (3) ◽  
pp. 502-518 ◽  
Author(s):  
Seyed Amin Bagherzadeh

Purpose This paper aims to propose a nonlinear model for aeroelastic aircraft that can predict the flight parameters throughout the investigated flight envelopes. Design/methodology/approach A system identification method based on the support vector machine (SVM) is developed and applied to the nonlinear dynamics of an aeroelastic aircraft. In the proposed non-parametric gray-box method, force and moment coefficients are estimated based on the state variables, flight conditions and control commands. Then, flight parameters are estimated using aircraft equations of motion. Nonlinear system identification is performed using the SVM network by minimizing errors between the calculated and estimated force and moment coefficients. To that end, a least squares algorithm is used as the training rule to optimize the generalization bound given for the regression. Findings The results confirm that the SVM is successful at the aircraft system identification. The precision of the SVM model is preserved when the models are excited by input commands different from the training ones. Also, the generalization of the SVM model is acceptable at non-trained flight conditions within the trained flight conditions. Considering the precision and generalization of the model, the results indicate that the SVM is more successful than the well-known methods such as artificial neural networks. Practical implications In this paper, both the simulated and real flight data of the F/A-18 aircraft are used to provide aeroelastic models for its lateral-directional dynamics. Originality/value This paper proposes a non-parametric system identification method for aeroelastic aircraft based on the SVM method for the first time. Up to the author’s best knowledge, the SVM is not used for the aircraft system identification or the aircraft parameter estimation until now.


Kybernetes ◽  
2018 ◽  
Vol 47 (5) ◽  
pp. 957-984 ◽  
Author(s):  
Sajjad Tofighy ◽  
Seyed Mostafa Fakhrahmad

Purpose This paper aims to propose a statistical and context-aware feature reduction algorithm that improves sentiment classification accuracy. Classification of reviews with different granularities in two classes of reviews with negative and positive polarities is among the objectives of sentiment analysis. One of the major issues in sentiment analysis is feature engineering while it severely affects time complexity and accuracy of sentiment classification. Design/methodology/approach In this paper, a feature reduction method is proposed that uses context-based knowledge as well as synset statistical knowledge. To do so, one-dimensional presentation proposed for SentiWordNet calculates statistical knowledge that involves polarity concentration and variation tendency for each synset. Feature reduction involves two phases. In the first phase, features that combine semantic and statistical similarity conditions are put in the same cluster. In the second phase, features are ranked and then the features which are given lower ranks are eliminated. The experiments are conducted by support vector machine (SVM), naive Bayes (NB), decision tree (DT) and k-nearest neighbors (KNN) algorithms to classify the vectors of the unigram and bigram features in two classes of positive or negative sentiments. Findings The results showed that the applied clustering algorithm reduces SentiWordNet synset to less than half which reduced the size of the feature vector by less than half. In addition, the accuracy of sentiment classification is improved by at least 1.5 per cent. Originality/value The presented feature reduction method is the first use of the synset clustering for feature reduction. In this paper features reduction algorithm, first aggregates the similar features into clusters then eliminates unsatisfactory cluster.


Author(s):  
M. Chelabi ◽  
T. Hacib ◽  
Z. Belli ◽  
M. R. Mekideche ◽  
Y. Le Bihan

Purpose – Eddy current testing (ECT) is a nondestructive testing method for the detection of flaws that uses electromagnetic induction to find defects in conductive materials. In this method, eddy currents are generated in a conductive material by a changing magnetic field. A defect is detected when there is a disruption in the flow of the eddy current. The purpose of this paper is to develop a new noniterative inversion methodology for detecting degradation (defect characterization) such as cracking, corrosion and erosion from the measurement of the impedance variations. Design/methodology/approach – The methodology is based on multi-output support vector machines (SVM) combined with the adaptive database schema design method (SDM). The forward problem was solved numerically using finite element method (FEM), with its accuracy experimentally verified. The multi-output SVM is a statistical learning method that has good generalization capability and learning performance. FEM is used to create the adaptive database required to train the multi-output SVM and the genetic algorithm is used to tune the parameters of multi-output SVM model. Findings – The results show the applicability of multi-output SVM to solve eddy current inverse problems instead of using traditional iterative inversion methods which can be very time-consuming. With the experimental results the authors demonstrate the accuracy which can be provided by the multi-output SVM technique. Practical implications – The work allows extending the capability of the experimentation ECT defect characterization system developed at LGEP. Originality/value – A new inversion method is developed and applied to ECT defect characterization. This new concept introduces multi-output SVM in the context of ECT. The real data together with estimated one obtained by multi-output SVM model are compared in order to evaluate the effectiveness of the developed technique.


Kybernetes ◽  
2019 ◽  
Vol 49 (10) ◽  
pp. 2547-2567 ◽  
Author(s):  
Himanshu Sharma ◽  
Anu G. Aggarwal

Purpose The experiential nature of travel and tourism services has popularized the importance of electronic word-of-mouth (EWOM) among potential customers. EWOM has a significant influence on hotel booking intention of customers as they tend to trust EWOM more than the messages spread by marketers. Amid abundant reviews available online, it becomes difficult for travelers to identify the most significant ones. This questions the credibility of reviewers as various online businesses allow reviewers to post their feedback using nickname or email address rather than using real name, photo or other personal information. Therefore, this study aims to determine the factors leading to reviewer credibility. Design/methodology/approach The paper proposes an econometric model to determine the variables that affect the reviewer’s credibility in the hospitality and tourism sector. The proposed model uses quantifiable variables of reviewers and reviews to estimate reviewer credibility, defined in terms of proportion of number of helpful votes received by a reviewer to the number of total reviews written by him. This covers both aspects of source credibility i.e. trustworthiness and expertness. The authors have used the data set of TripAdvisor.com to validate the models. Findings Regression analysis significantly validated the econometric models proposed here. To check the predictive efficiency of the models, predictive modeling using five commonly used classifiers such as random forest (RF), linear discriminant analysis, k-nearest neighbor, decision tree and support vector machine is performed. RF gave the best accuracy for the overall model. Practical implications The findings of this research paper suggest various implications for hoteliers and managers to help retain credible reviewers in the online travel community. This will help them to achieve long term relationships with the clients and increase their trust in the brand. Originality/value To the best of authors’ knowledge, this study performs an econometric modeling approach to find determinants of reviewer credibility, not conducted in previous studies. Moreover, the study contracts from earlier works by considering it to be an endogenous variable, rather than an exogenous one.


2016 ◽  
Vol 28 (1) ◽  
pp. 65-76 ◽  
Author(s):  
Xudong Sun ◽  
Mingxing Zhou ◽  
Yize Sun

Purpose – The purpose of this paper is to develop near infrared (NIR) techniques coupled with multivariate calibration methods to rapid measure cotton content in blend fabrics. Design/methodology/approach – In total, 124 and 41 samples were used to calibrate models and assess the performance of the models, respectively. Multivariate calibration methods of partial least square (PLS), extreme learning machine (ELM) and least square support vector machine (LS-SVM) were employed to develop the models. Through comparing the performance of PLS, ELM and LS-SVM models with new samples, the optimal model of cotton content was obtained with LS-SVM model. The correlation coefficient of prediction (r p ) and root mean square errors of prediction were 0.98 and 4.50 percent, respectively. Findings – The results suggest that NIR technique combining with LS-SVM method has significant potential to quantitatively analyze cotton content in blend fabrics. Originality/value – It may have commercial and regulatory potential to avoid time consuming work, costly and laborious chemical analysis for cotton content in blend fabrics.


2018 ◽  
Vol 16 (3) ◽  
pp. 385-397 ◽  
Author(s):  
Ralph Olusola Aluko ◽  
Emmanuel Itodo Daniel ◽  
Olalekan Shamsideen Oshodi ◽  
Clinton Ohis Aigbavboa ◽  
Abiodun Olatunji Abisuga

Purpose In recent years, there has been a tremendous increase in the number of applicants seeking placements in undergraduate architecture programs. It is important during the selection phase of admission at universities to identify new intakes who possess the capability to succeed. Admission variable (i.e. prior academic achievement) is one of the most important criteria considered during the selection process. This paper aims to investigates the efficacy of using data mining techniques to predict the academic performance of architecture students based on information contained in prior academic achievement. Design/methodology/approach The input variables, i.e. prior academic achievement, were extracted from students’ academic records. Logistic regression and support vector machine (SVM) are the data mining techniques adopted in this study. The collected data were divided into two parts. The first part was used for training the model, while the other part was used to evaluate the predictive accuracy of the developed models. Findings The results revealed that SVM model outperformed the logistic regression model in terms of accuracy. Taken together, it is evident that prior academic achievement is a good predictor of academic performance of architecture students. Research limitations/implications Although the factors affecting academic performance of students are numerous, the present study focuses on the effect of prior academic achievement on academic performance of architecture students. Originality/value The developed SVM model can be used as a decision-making tool for selecting new intakes into the architecture program at Nigerian universities.


2019 ◽  
Vol 37 (1) ◽  
pp. 161-180
Author(s):  
Min Hao ◽  
Guangyuan Liu ◽  
Desheng Xie ◽  
Ming Ye ◽  
Jing Cai

Purpose Happiness is an important mental emotion and yet becoming a major health concern nowadays. For this reason, better recognizing the objective understanding of how humans respond to event-related observations in their daily lives is especially important. Design/methodology/approach This paper uses non-intrusive technology (hyperspectral imaging [HSI]) for happiness recognition. Experimental setup is conducted for data collection in real-life environments where observers are showing spontaneous expressions of emotions (calm, happy, unhappy: angry) during the experimental process. Based on facial imaging captured from HSI, this work collects our emotional database defined as SWU Happiness DB and studies whether the physiological signal (i.e. tissue oxygen saturation [StO2], obtained by an optical absorption model) can be used to recognize observer happiness automatically. It proposes a novel method to capture local dynamic patterns (LDP) in facial regions, introducing local variations in facial StO2 to fully use physiological characteristics with regard to hyperspectral patterns. Further, it applies a linear discriminant analysis-based support vector machine to recognize happiness patterns. Findings The results show that the best classification accuracy is 97.89 per cent, objectively demonstrating a feasible application of LDP features on happiness recognition. Originality/value This paper proposes a novel feature (i.e. LDP) to represent the local variations in facial StO2 for modeling the active happiness. It provides a possible extension to the promising practical application.


Author(s):  
Hendri Murfi ◽  
Furida Lusi Siagian ◽  
Yudi Satria

Purpose The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets. Design/methodology/approach Given Indonesian tweets, the processes of sentiment analysis start by extracting features from the tweets. The features are words or topics. The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class. Findings The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets. Both data sets are about sentiments of candidates for Indonesian presidential election. The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis. Moreover, the topic features can slightly improve the accuracy of the standard word features. The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis. Originality/value The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing. This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.


2017 ◽  
Vol 4 (1) ◽  
pp. 56-74 ◽  
Author(s):  
Abinash Tripathy ◽  
Santanu Kumar Rath

Sentiment analysis helps to determine hidden intention of the concerned author of any topic and provides an evaluation report on the polarity of any document. The polarity may be positive, negative or neutral. It is observed that very often the data associated with the sentiment analysis consist of the feedback given by various specialists on any topic or product. Thus, the review may be categorized properly into any sort of class based on the polarity, in order to have a good knowledge about the product. This article proposes an approach to classify the review dataset made on basis of sentiment analysis into different polarity groups. Four machine learning algorithms viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest, and Linear Discriminant Analysis (LDA) have been considered in this paper for classification process. The obtained result on values of accuracy of the algorithms are critically examined by using different performance parameters, applied on two different datasets.


Sign in / Sign up

Export Citation Format

Share Document