scholarly journals Text‑Mining in Streams of Textual Data Using Time Series Applied to Stock Market

Author(s):  
Pavel Netolický ◽  
Jonáš Petrovský ◽  
František Dařena

Each day, a lot of text data is generated. This data comes from various sources and may contain valuable information. In this article, we use text mining methods to discover if there is a connection between news articles and changes of the S&P 500 stock index. The index values and documents were divided into time windows according to the direction of the index value changes. We achieved a classification accuracy of 65–74 %.

Author(s):  
Masaomi Kimura ◽  

Text mining has been growing; mainly due to the need to extract useful information from vast amounts of textual data. Our target here is text data, a collection of freely described data from questionnaires. Unlike research papers, newspaper articles, call-center logs and web pages, which are usually the targets of text mining analysis, the freely described data contained in the questionnaire responses have specific characteristics, including a small number of short sentences forming individual pieces of data, while the wide variety of content precludes the applications of clustering algorithms used to classify the same. In this paper, we suggest the way to extract the opinions which are delivered by multiple respondents, based on the modification relationships included in each sentence in the freely described data. Certain applications of our method are also presented after the introduction of our approach.


2019 ◽  
Vol 8 (4) ◽  
pp. 1357-1360

This paper takes focus on the area of automatic sarcasm detection. Automatic sarcasm detection is crucial due to the needs of sentimental analysis. The rapid development of automatic speech recognition and text mining and the large amount of voice and text data opens a broader way for researchers to open new method and improves the accuracy of automatic sarcasm detection. We observe approaches that have been used to detect sarcasm, kind of data and its features including the rises of context to improve the accuracy of automatic sarcasm detection. We found that some context cannot be reliable without the presence of other context and some approaches are very dependent on the dataset. Twitter is being used by researchers as the main mine for sentimental analysis, we notice that at some aspect it still has a flaw because it is dependent to some Twitter’s special feature that will not be found in other usual text data like hashtags and author history. Besides that, we see that the small amount of research about automatic sarcasm detection through acoustic data and its correlation with textual data could make a new opportunity in the area of sarcasm detection in speech. From acoustic data, we could get both acoustic features and textual features. Sarcasm detection with voice has the potential to get higher accuracy since it can be extracted into two data types. By describing each beneficial method, this paper could be a brief way to sarcasm detection through acoustic and textual data.


2021 ◽  
Vol 18 (2) ◽  
pp. 215
Author(s):  
Dita Afida ◽  
Erika Devi Udayanti ◽  
Etika Kartikadarma

<p>Social media is a service that is very supportive for government activities, especially in providing openness and community-based government. One form of its implementation is the Semarang City government through the Center for Community Complaints Management (P3M), whose task is to manage community complaints that enter one of the communication channels namely social media twitter. The number of public complaints that enter every day is very varied. This is certainly quite difficult for managers in categorizing complaints reports according to the relevant Local Government Organizations (OPD). This paper focuses on the problem of how to conduct clustering of community complaints. The data source comes from Twitter using the keyword "Laporhendi". Text document data from community complaint tweets was analyzed by text mining methods. A number of pre-processing of text data processing begins with the process of case folding, tokenizing, stemming, stopword removal and word robbering with tf-idf. In conducting cluster mapping, clustering algorithm will be used in dividing the complaint cluster, namely the k-means algorithm. Evaluation of cluster results is done by using purity to determine the accuracy of the results of grouping or clustering.</p>


Author(s):  
Jonathan S. Lewis

Text mining presents an efficient, scalable method to separate signals and noise in large-scale text data, and therefore to effectively analyze open-ended survey responses as well as the tremendous amount of text that students, faculty, and staff produce through their interactions online. Traditional qualitative methods are impractical when working with these data, and text mining methods are consonant with current literature on thematic analysis. This chapter provides a tutorial for researchers new to this method, including a lengthy discussion of preprocessing tasks and knowledge extraction from both supervised and unsupervised activities, potential data sources, and the range of software (both proprietary and open-source) available to them. Examples are provided throughout the paper of text mining at work in two studies involving data collected from college students. Limitations of this method and implications for future research and policy are discussed.


Author(s):  
A. Durfee ◽  
A. Visa ◽  
H. Vanharanta ◽  
S. Schneberger ◽  
B. Back

Text documents are the most common means for exchanging formal knowledge among people. Text is a rich medium that can contain a vast range of information, but text can be difficult to decipher automatically. Many organizations have vast repositories of textual data but with few means of automatically mining that text. Text mining methods seek to use an understanding of natural language text to extract information relevant to user needs. This article evaluates a new text mining methodology: prototypematching for text clustering, developed by the authors’ research group. The methodology was applied to four applications: clustering documents based on their abstracts, analyzing financial data, distinguishing authorship, and evaluating multiple translation similarity. The results are discussed in terms of common business applications and possible future research.


2017 ◽  
Vol 13 (21) ◽  
pp. 429
Author(s):  
Nadeem Ur-Rahman

Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.


Author(s):  
Hoang T. P. Thanh ◽  
◽  
Phayung Meesad ◽  

Predicting the behaviors of the stock markets are always an interesting topic for not only financial investors but also scholars and professionals from different fields, because successful prediction can help investors to yield significant profits. Previous researchers have shown the strong correlation between financial news and their impacts to the movements of stock prices. This paper proposes an approach of using time series analysis and text mining techniques to predict daily stock market trends. The research is conducted with the utilization of a database containing stock index prices and news articles collected from Vietnam websites over 3 years from 2010 to 2012. A robust feature selection and a strong machine learning algorithm are able to lift the forecasting accuracy. By combining Linear Support Vector Machine Weight and Support Vector Machine algorithm, this proposed approach can enhance the prediction accuracy significantly above those of related research approaches. The results show that data set represented by 42 features achieves the highest accuracy by using one-against-one Support Vector Machines (up to 75%) and one-against-one method outperforms one-againstall method in almost all case studies.


2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


Entropy ◽  
2019 ◽  
Vol 21 (5) ◽  
pp. 455 ◽  
Author(s):  
Hongjun Guan ◽  
Zongli Dai ◽  
Shuang Guan ◽  
Aiwu Zhao

In time series forecasting, information presentation directly affects prediction efficiency. Most existing time series forecasting models follow logical rules according to the relationships between neighboring states, without considering the inconsistency of fluctuations for a related period. In this paper, we propose a new perspective to study the problem of prediction, in which inconsistency is quantified and regarded as a key characteristic of prediction rules. First, a time series is converted to a fluctuation time series by comparing each of the current data with corresponding previous data. Then, the upward trend of each of fluctuation data is mapped to the truth-membership of a neutrosophic set, while a falsity-membership is used for the downward trend. Information entropy of high-order fluctuation time series is introduced to describe the inconsistency of historical fluctuations and is mapped to the indeterminacy-membership of the neutrosophic set. Finally, an existing similarity measurement method for the neutrosophic set is introduced to find similar states during the forecasting stage. Then, a weighted arithmetic averaging (WAA) aggregation operator is introduced to obtain the forecasting result according to the corresponding similarity. Compared to existing forecasting models, the neutrosophic forecasting model based on information entropy (NFM-IE) can represent both fluctuation trend and fluctuation consistency information. In order to test its performance, we used the proposed model to forecast some realistic time series, such as the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX), the Shanghai Stock Exchange Composite Index (SHSECI), and the Hang Seng Index (HSI). The experimental results show that the proposed model can stably predict for different datasets. Simultaneously, comparing the prediction error to other approaches proves that the model has outstanding prediction accuracy and universality.


Sign in / Sign up

Export Citation Format

Share Document