scholarly journals Real-Time Sentiment Analysis of Twitter Streaming data for Stock Prediction

2018 ◽  
Vol 132 ◽  
pp. 956-964 ◽  
Author(s):  
Sushree Das ◽  
Ranjan Kumar Behera ◽  
Mukesh kumar ◽  
Santanu Kumar Rath
Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Xiongwei Zhang ◽  
Hager Saleh ◽  
Eman M. G. Younis ◽  
Radhya Sahal ◽  
Abdelmgeid A. Ali

Twitter is a virtual social network where people share their posts and opinions about the current situation, such as the coronavirus pandemic. It is considered the most significant streaming data source for machine learning research in terms of analysis, prediction, knowledge extraction, and opinions. Sentiment analysis is a text analysis method that has gained further significance due to social networks’ emergence. Therefore, this paper introduces a real-time system for sentiment prediction on Twitter streaming data for tweets about the coronavirus pandemic. The proposed system aims to find the optimal machine learning model that obtains the best performance for coronavirus sentiment analysis prediction and then uses it in real-time. The proposed system has been developed into two components: developing an offline sentiment analysis and modeling an online prediction pipeline. The system has two components: the offline and the online components. For the offline component of the system, the historical tweets’ dataset was collected in duration 23/01/2020 and 01/06/2020 and filtered by #COVID-19 and #Coronavirus hashtags. Two feature extraction methods of textual data analysis were used, n-gram and TF-ID, to extract the dataset’s essential features, collected using coronavirus hashtags. Then, five regular machine learning algorithms were performed and compared: decision tree, logistic regression, k-nearest neighbors, random forest, and support vector machine to select the best model for the online prediction component. The online prediction pipeline was developed using Twitter Streaming API, Apache Kafka, and Apache Spark. The experimental results indicate that the RF model using the unigram feature extraction method has achieved the best performance, and it is used for sentiment prediction on Twitter streaming data for coronavirus.


2020 ◽  
Vol 26 (9) ◽  
pp. 1128-1147
Author(s):  
Ranjan Behera ◽  
Sushree Das ◽  
Santanu Rath ◽  
Sanjay Misra ◽  
Robertas Damasevicius

Stock prediction is one of the emerging applications in the field of data science which help the companies to make better decision strategy. Machine learning models play a vital role in the field of prediction. In this paper, we have proposed various machine learning models which predicts the stock price from the real-time streaming data. Streaming data has been a potential source for real-time prediction which deals with continuous ow of data having information from various sources like social networking websites, server logs, mobile phone applications, trading oors etc. We have adopted the distributed platform, Spark to analyze the streaming data collected from two different sources as represented in two case studies in this paper. The first case study is based on stock prediction from the historical data collected from Google finance websites through NodeJs and the second one is based on the sentiment analysis of Twitter collected through Twitter API available in Stanford NLP package. Several researches have been made in developing models for stock prediction based on static data. In this work, an effort has been made to develop scalable, fault tolerant models for stock prediction from the real-time streaming data. The Proposed model is based on a distributed architecture known as Lambda architecture. The extensive comparison is made between actual and predicted output for different machine learning models. Support vector regression is found to have better accuracy as compared to other models. The historical data is considered as a ground truth data for validation.


2019 ◽  
Vol 23 (1) ◽  
pp. 346-357
Author(s):  
Vithya G ◽  
Naren J ◽  
Varun V

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Suppawong Tuarob ◽  
Poom Wettayakorn ◽  
Ponpat Phetchai ◽  
Siripong Traivijitkhun ◽  
Sunghoon Lim ◽  
...  

AbstractThe explosion of online information with the recent advent of digital technology in information processing, information storing, information sharing, natural language processing, and text mining techniques has enabled stock investors to uncover market movement and volatility from heterogeneous content. For example, a typical stock market investor reads the news, explores market sentiment, and analyzes technical details in order to make a sound decision prior to purchasing or selling a particular company’s stock. However, capturing a dynamic stock market trend is challenging owing to high fluctuation and the non-stationary nature of the stock market. Although existing studies have attempted to enhance stock prediction, few have provided a complete decision-support system for investors to retrieve real-time data from multiple sources and extract insightful information for sound decision-making. To address the above challenge, we propose a unified solution for data collection, analysis, and visualization in real-time stock market prediction to retrieve and process relevant financial data from news articles, social media, and company technical information. We aim to provide not only useful information for stock investors but also meaningful visualization that enables investors to effectively interpret storyline events affecting stock prices. Specifically, we utilize an ensemble stacking of diversified machine-learning-based estimators and innovative contextual feature engineering to predict the next day’s stock prices. Experiment results show that our proposed stock forecasting method outperforms a traditional baseline with an average mean absolute percentage error of 0.93. Our findings confirm that leveraging an ensemble scheme of machine learning methods with contextual information improves stock prediction performance. Finally, our study could be further extended to a wide variety of innovative financial applications that seek to incorporate external insight from contextual information such as large-scale online news articles and social media data.


Algorithms ◽  
2019 ◽  
Vol 12 (2) ◽  
pp. 37 ◽  
Author(s):  
Zhigang Hu ◽  
Hui Kang ◽  
Meiguang Zheng

A distributed data stream processing system handles real-time, changeable and sudden streaming data load. Its elastic resource allocation has become a fundamental and challenging problem with a fixed strategy that will result in waste of resources or a reduction in QoS (quality of service). Spark Streaming as an emerging system has been developed to process real time stream data analytics by using micro-batch approach. In this paper, first, we propose an improved SVR (support vector regression) based stream data load prediction scheme. Then, we design a spark-based maximum sustainable throughput of time window (MSTW) performance model to find the optimized number of virtual machines. Finally, we present a resource scaling algorithm TWRES (time window resource elasticity scaling algorithm) with MSTW constraint and streaming data load prediction. The evaluation results show that TWRES could improve resource utilization and mitigate SLA (service level agreement) violation.


The purpose of this work is to develop a UJSON web technology with C# application to analyze the student data in real-ime. Execute continuous requests on JSON streaming data based on advanced technologies for parallel streaming computing, suitable for solving analytic problems and calculation of metrics in real-time. The developed management information system in this research work designed to filtering event flow, building an event flow as a query result, grouping and aggregation of events, and creating window semantics. For testing the proposed work, several queries were selected that implement aggregation with different types of semantic windows (Steps, Slides). Testing was done locally and on education moodle clusters. It was used 4 types of configurations 2, 4, 8, and 16 computing nodes. Based on the obtained results, scalability is noticeable with an increase in the number of nodes. The updated functions of the proposed UJSON could improve the construction of parallel flow systems and data processing. The developed approach based on modern and advanced parallel flow technologies for output calculations considering the pros and cons of various approaches found in the current era.


2021 ◽  
Vol 1 (2) ◽  
pp. 9-15
Author(s):  
V Mareeswari ◽  
Sunita S Patil ◽  
Ramanan G

Sentiment Analysis is becoming the field of focus with time considering the user experience weighs much more for the business to grow and for the studies as well. The sentimental expressions refers to the emotions or feeling of a person across certain point of focus or issues. So, in this project, with the assistance of Apache Spark Framework, an open source data streaming and processing platform, sentiment evaluation is done on the tweets from Twitter by the means of real time processing as well as an Ad-hoc Run. Some preprocessing of the textual data has been done upon for better characteristics extraction thus resulting in greater accuracy. The validation of this has been done for achieving better result by comparing the other processes when Naive Bayes algorithm is used.


Author(s):  
Wanting Zhao ◽  
Fan Wu ◽  
Zhongqi Fu ◽  
Zesen Wang ◽  
Xiaoqi Zhang

Sign in / Sign up

Export Citation Format

Share Document