Text‑Mining in Streams of Textual Data Using Time Series Applied to Stock Market

Each day, a lot of text data is generated. This data comes from various sources and may contain valuable information. In this article, we use text mining methods to discover if there is a connection between news articles and changes of the S&P 500 stock index. The index values and documents were divided into time windows according to the direction of the index value changes. We achieved a classification accuracy of 65–74 %.

Download Full-text

The Method to Analyze Freely Described Data from Questionnaires

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2009.p0268 ◽

2009 ◽

Vol 13 (3) ◽

pp. 268-274 ◽

Cited By ~ 3

Author(s):

Masaomi Kimura ◽

Keyword(s):

Text Mining ◽

Call Center ◽

Clustering Algorithms ◽

Web Pages ◽

Research Papers ◽

Text Data ◽

Textual Data ◽

The Way ◽

Newspaper Articles

Text mining has been growing; mainly due to the need to extract useful information from vast amounts of textual data. Our target here is text data, a collection of freely described data from questionnaires. Unlike research papers, newspaper articles, call-center logs and web pages, which are usually the targets of text mining analysis, the freely described data contained in the questionnaire responses have specific characteristics, including a small number of short sentences forming individual pieces of data, while the wide variety of content precludes the applications of clustering algorithms used to classify the same. In this paper, we suggest the way to extract the opinions which are delivered by multiple respondents, based on the modification relationships included in each sentence in the freely described data. Certain applications of our method are also presented after the introduction of our approach.

Download Full-text

Automatic Sarcasm Detection with Textual and Acoustic Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7215.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 1357-1360

Keyword(s):

Speech Recognition ◽

Text Mining ◽

Automatic Speech Recognition ◽

Rapid Development ◽

Acoustic Features ◽

Data Types ◽

Text Data ◽

Acoustic Data ◽

Textual Data ◽

Textual Features

This paper takes focus on the area of automatic sarcasm detection. Automatic sarcasm detection is crucial due to the needs of sentimental analysis. The rapid development of automatic speech recognition and text mining and the large amount of voice and text data opens a broader way for researchers to open new method and improves the accuracy of automatic sarcasm detection. We observe approaches that have been used to detect sarcasm, kind of data and its features including the rises of context to improve the accuracy of automatic sarcasm detection. We found that some context cannot be reliable without the presence of other context and some approaches are very dependent on the dataset. Twitter is being used by researchers as the main mine for sentimental analysis, we notice that at some aspect it still has a flaw because it is dependent to some Twitter’s special feature that will not be found in other usual text data like hashtags and author history. Besides that, we see that the small amount of research about automatic sarcasm detection through acoustic data and its correlation with textual data could make a new opportunity in the area of sarcasm detection in speech. From acoustic data, we could get both acoustic features and textual features. Sarcasm detection with voice has the potential to get higher accuracy since it can be extracted into two data types. By describing each beneficial method, this paper could be a brief way to sarcasm detection through acoustic and textual data.

Download Full-text

Aplikasi Text Mining untuk Klasterisasi Aduan Masyarakat Kota Semarang Menggunakan Algoritma K-means

Jurnal Transformatika ◽

10.26623/transformatika.v18i2.2362 ◽

2021 ◽

Vol 18 (2) ◽

pp. 215

Author(s):

Dita Afida ◽

Erika Devi Udayanti ◽

Etika Kartikadarma

Keyword(s):

Social Media ◽

Text Mining ◽

Clustering Algorithm ◽

City Government ◽

Community Based ◽

Text Data ◽

Government Organizations ◽

Text Document ◽

Data Source ◽

Mining Methods

<p>Social media is a service that is very supportive for government activities, especially in providing openness and community-based government. One form of its implementation is the Semarang City government through the Center for Community Complaints Management (P3M), whose task is to manage community complaints that enter one of the communication channels namely social media twitter. The number of public complaints that enter every day is very varied. This is certainly quite difficult for managers in categorizing complaints reports according to the relevant Local Government Organizations (OPD). This paper focuses on the problem of how to conduct clustering of community complaints. The data source comes from Twitter using the keyword "Laporhendi". Text document data from community complaint tweets was analyzed by text mining methods. A number of pre-processing of text data processing begins with the process of case folding, tokenizing, stemming, stopword removal and word robbering with tf-idf. In conducting cluster mapping, clustering algorithm will be used in dividing the complaint cluster, namely the k-means algorithm. Evaluation of cluster results is done by using purity to determine the accuracy of the results of grouping or clustering.</p>

Download Full-text

Amplifying Participant Voices Through Text Mining

Advancing Educational Research With Emerging Technology - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-7998-1173-2.ch011 ◽

2020 ◽

pp. 231-247

Author(s):

Jonathan S. Lewis

Keyword(s):

Text Mining ◽

Thematic Analysis ◽

Large Scale ◽

Future Research ◽

Text Data ◽

Tremendous Amount ◽

Mining Methods ◽

Survey Responses ◽

Research And Policy ◽

Lengthy Discussion

Text mining presents an efficient, scalable method to separate signals and noise in large-scale text data, and therefore to effectively analyze open-ended survey responses as well as the tremendous amount of text that students, faculty, and staff produce through their interactions online. Traditional qualitative methods are impractical when working with these data, and text mining methods are consonant with current literature on thematic analysis. This chapter provides a tutorial for researchers new to this method, including a lengthy discussion of preprocessing tasks and knowledge extraction from both supervised and unsupervised activities, potential data sources, and the range of software (both proprietary and open-source) available to them. Examples are provided throughout the paper of text mining at work in two studies involving data collected from college students. Limitations of this method and implications for future research and policy are discussed.

Download Full-text

Mining Text with the Prototype-Matching Method

Advances in Information Resources Management - Best Practices and Conceptual Innovations in Information Resources Management ◽

10.4018/978-1-60566-128-5.ch020 ◽

2011 ◽

pp. 328-340

Author(s):

A. Durfee ◽

A. Visa ◽

H. Vanharanta ◽

S. Schneberger ◽

B. Back

Keyword(s):

Text Mining ◽

Future Research ◽

Text Documents ◽

Natural Language Text ◽

Business Applications ◽

Prototype Matching ◽

Textual Data ◽

Mining Methods ◽

Vast Range ◽

Language Text

Text documents are the most common means for exchanging formal knowledge among people. Text is a rich medium that can contain a vast range of information, but text can be difficult to decipher automatically. Many organizations have vast repositories of textual data but with few means of automatically mining that text. Text mining methods seek to use an understanding of natural language text to extract information relevant to user needs. This article evaluates a new text mining methodology: prototypematching for text clustering, developed by the authors’ research group. The methodology was applied to four applications: clustering documents based on their abstracts, analyzing financial data, distinguishing authorship, and evaluating multiple translation similarity. The results are discussed in terms of common business applications and possible future research.

Download Full-text

Textual Data Mining For Knowledge Discovery and Data Classification: A Comparative Study

European Scientific Journal ESJ ◽

10.19044/esj.2017.v13n21p429 ◽

2017 ◽

Vol 13 (21) ◽

pp. 429

Author(s):

Nadeem Ur-Rahman

Keyword(s):

Data Mining ◽

Text Mining ◽

Language Processing ◽

Business Processes ◽

Decision Makers ◽

Text Documents ◽

Textual Data ◽

Statistical Natural Language Processing ◽

Mining Methods ◽

Many Sources

Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.

Download Full-text

Stock Market Trend Prediction Based on Text Mining of Corporate Web and Time Series Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2014.p0022 ◽

2014 ◽

Vol 18 (1) ◽

pp. 22-31 ◽

Cited By ~ 7

Author(s):

Hoang T. P. Thanh ◽

◽

Phayung Meesad ◽

Keyword(s):

Time Series ◽

Support Vector Machine ◽

Text Mining ◽

Stock Market ◽

Stock Prices ◽

Time Series Data ◽

Stock Index ◽

Series Data ◽

Support Vector ◽

Data Set

Predicting the behaviors of the stock markets are always an interesting topic for not only financial investors but also scholars and professionals from different fields, because successful prediction can help investors to yield significant profits. Previous researchers have shown the strong correlation between financial news and their impacts to the movements of stock prices. This paper proposes an approach of using time series analysis and text mining techniques to predict daily stock market trends. The research is conducted with the utilization of a database containing stock index prices and news articles collected from Vietnam websites over 3 years from 2010 to 2012. A robust feature selection and a strong machine learning algorithm are able to lift the forecasting accuracy. By combining Linear Support Vector Machine Weight and Support Vector Machine algorithm, this proposed approach can enhance the prediction accuracy significantly above those of related research approaches. The results show that data set represented by 42 features achieves the highest accuracy by using one-against-one Support Vector Machines (up to 75%) and one-against-one method outperforms one-againstall method in almost all case studies.

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

A Neutrosophic Forecasting Model for Time Series Based on First-Order State and Information Entropy of High-Order Fluctuation

Entropy ◽

10.3390/e21050455 ◽

2019 ◽

Vol 21 (5) ◽

pp. 455 ◽

Cited By ~ 5

Author(s):

Hongjun Guan ◽

Zongli Dai ◽

Shuang Guan ◽

Aiwu Zhao

Keyword(s):

Time Series ◽

Information Entropy ◽

Stock Exchange ◽

Time Series Forecasting ◽

High Order ◽

Stock Index ◽

Neutrosophic Set ◽

Forecasting Model ◽

Forecasting Models ◽

Proposed Model

In time series forecasting, information presentation directly affects prediction efficiency. Most existing time series forecasting models follow logical rules according to the relationships between neighboring states, without considering the inconsistency of fluctuations for a related period. In this paper, we propose a new perspective to study the problem of prediction, in which inconsistency is quantified and regarded as a key characteristic of prediction rules. First, a time series is converted to a fluctuation time series by comparing each of the current data with corresponding previous data. Then, the upward trend of each of fluctuation data is mapped to the truth-membership of a neutrosophic set, while a falsity-membership is used for the downward trend. Information entropy of high-order fluctuation time series is introduced to describe the inconsistency of historical fluctuations and is mapped to the indeterminacy-membership of the neutrosophic set. Finally, an existing similarity measurement method for the neutrosophic set is introduced to find similar states during the forecasting stage. Then, a weighted arithmetic averaging (WAA) aggregation operator is introduced to obtain the forecasting result according to the corresponding similarity. Compared to existing forecasting models, the neutrosophic forecasting model based on information entropy (NFM-IE) can represent both fluctuation trend and fluctuation consistency information. In order to test its performance, we used the proposed model to forecast some realistic time series, such as the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX), the Shanghai Stock Exchange Composite Index (SHSECI), and the Hang Seng Index (HSI). The experimental results show that the proposed model can stably predict for different datasets. Simultaneously, comparing the prediction error to other approaches proves that the model has outstanding prediction accuracy and universality.

Download Full-text

Operational text-mining methods for enhancing building maintenance management

Building Research & Information ◽

10.1080/09613218.2021.1953368 ◽

2021 ◽

pp. 1-19

Author(s):

Marco Marocco ◽

Ilaria Garofolo

Keyword(s):

Text Mining ◽

Maintenance Management ◽

Building Maintenance ◽

Mining Methods

Download Full-text