Stock market prediction using machine learning classifiers and social media, news

Author(s):  
Wasiat Khan ◽  
Mustansar Ali Ghazanfar ◽  
Muhammad Awais Azam ◽  
Amin Karami ◽  
Khaled H. Alyoubi ◽  
...  
2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Suppawong Tuarob ◽  
Poom Wettayakorn ◽  
Ponpat Phetchai ◽  
Siripong Traivijitkhun ◽  
Sunghoon Lim ◽  
...  

AbstractThe explosion of online information with the recent advent of digital technology in information processing, information storing, information sharing, natural language processing, and text mining techniques has enabled stock investors to uncover market movement and volatility from heterogeneous content. For example, a typical stock market investor reads the news, explores market sentiment, and analyzes technical details in order to make a sound decision prior to purchasing or selling a particular company’s stock. However, capturing a dynamic stock market trend is challenging owing to high fluctuation and the non-stationary nature of the stock market. Although existing studies have attempted to enhance stock prediction, few have provided a complete decision-support system for investors to retrieve real-time data from multiple sources and extract insightful information for sound decision-making. To address the above challenge, we propose a unified solution for data collection, analysis, and visualization in real-time stock market prediction to retrieve and process relevant financial data from news articles, social media, and company technical information. We aim to provide not only useful information for stock investors but also meaningful visualization that enables investors to effectively interpret storyline events affecting stock prices. Specifically, we utilize an ensemble stacking of diversified machine-learning-based estimators and innovative contextual feature engineering to predict the next day’s stock prices. Experiment results show that our proposed stock forecasting method outperforms a traditional baseline with an average mean absolute percentage error of 0.93. Our findings confirm that leveraging an ensemble scheme of machine learning methods with contextual information improves stock prediction performance. Finally, our study could be further extended to a wide variety of innovative financial applications that seek to incorporate external insight from contextual information such as large-scale online news articles and social media data.


2021 ◽  
Author(s):  
Zhaoxia Wang ◽  
Zhenda HU ◽  
Fang LI ◽  
Seng-Beng HO

Abstract Stock market trending analysis is one of the key research topics in financial analysis. Various theories once highlighted the non-viability of stock market prediction. With the advent of machine learning and Artificial Intelligence (AI), more and more efforts have been devoted to this research area, and predicting the stock market has been demonstrated to be possible. Learning-based methods have been popularly studied for stock price prediction. However, due to the dynamic nature of the stock market and its non-linearity, stock market prediction is still one of the most dificult tasks. With the rise of social networks, huge amount of data is being generated every day and there is a gaining in popularity of incorporating these data into prediction model in the effort to enhance the prediction performance. Therefore, this paper explores the possibilities of the viability of learning-based stock market trending prediction by incorporating social media sentiment analysis. Six machine learning methods including Multi-Layer Perception, Support Vector Machine, Naïve Bayes, Random Forest, Logistic Regression and Extreme Gradient Boosting are selected as the baseline model. The result indicates the possibilities of successful stock market trending prediction and the performance of different learning-based methods is discussed. It is discovered that the distribution of the value of stocks may affect the prediction performance of the methods involved. This research not only demonstrates the merits and weaknesses of different learning-based methods, but also points out that incorporating social opinion is a right direction for improving the performance of stock market trending prediction.


2021 ◽  
Vol 7 ◽  
pp. e775
Author(s):  
Malik Daler Ali Awan ◽  
Nadeem Iqbal Kajla ◽  
Amnah Firdous ◽  
Mujtaba Husnain ◽  
Malik Muhammad Saad Missen

The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language.


2021 ◽  
Vol 58 (1) ◽  
pp. 1932-1939
Author(s):  
Alim Al Ayub Ahmed Et al.

Internet is one of the important inventions and a large number of persons are its users. These persons use this for different purposes. There are different social media platforms that are accessible to these users. Any user can make a post or spread the news through these online platforms. These platforms do not verify the users or their posts. So some of the users try to spread fake news through these platforms. These fake news can be a propaganda against an individual, society, organization or political party. A human being is unable to detect all these fake news. So there is a need for machine learning classifiers that can detect these fake news automatically. Use of machine learning classifiers for detecting the fake news is described in this systematic literature review.


2020 ◽  
Vol 29 (6) ◽  
pp. 629-656
Author(s):  
Anatoliy Gruzd ◽  
Priya Kumar ◽  
Deena Abul-Fottouh ◽  
Caroline Haythornthwaite

AbstractAs social media become a staple for knowledge discovery and sharing, questions arise about how self-organizing communities manage learning outside the domain of organized, authority-led institutions. Yet examination of such communities is challenged by the quantity of posts and variety of media now used for learning. This paper addresses the challenges of identifying (1) what information, communication, and discursive practices support successful online communities, (2) whether such practices are similar on Twitter and Reddit, and (3) whether machine learning classifiers can be successfully used to analyze larger datasets of learning exchanges. This paper builds on earlier work that used manual coding of learning and exchange in Reddit ‘Ask’ communities to derive a coding schema we refer to as ‘learning in the wild’. This schema of eight categories: explanation with disagreement, agreement, or neutral presentation; socializing with negative, or positive intent; information seeking; providing resources; and comments about forum rules and norms. To compare across media, results from coding Reddit’s AskHistorians are compared to results from coding a sample of #Twitterstorians tweets (n = 594). High agreement between coders affirmed the applicability of the coding schema to this different medium. LIWC lexicon-based text analysis was used to build machine learning classifiers and apply these to code a larger dataset of tweets (n = 69,101). This research shows that the ‘learning in the wild’ coding schema holds across at least two different platforms, and is partially scalable to study larger online learning communities.


2020 ◽  
Vol 16 (1) ◽  
pp. 67
Author(s):  
Minghua Jia ◽  
Xiaodong Wang ◽  
Yue Xu ◽  
Zhanqi Cui ◽  
Ruilin Xie

Sign in / Sign up

Export Citation Format

Share Document