Multidimensional Text Warehousing for Automated Text Classification

2018 ◽  
Vol 11 (2) ◽  
pp. 168-183
Author(s):  
Jiyun Kim ◽  
Han-joon Kim

This article describes how, in the era of big data, a data warehouse is an integrated multidimensional database that provides the basis for the decision making required to establish crucial business strategies. Efficient, effective analysis requires a data organization system that integrates and manages data of various dimensions. However, conventional data warehousing techniques do not consider the various data manipulation operations required for data-mining activities. With the current explosion of text data, much research has examined text (or document) repositories to support text mining and document retrieval. Therefore, this article presents a method of developing a text warehouse that provides a machine-learning-based text classification service. The document is represented as a term-by-concept matrix using a 3rd-order tensor-based textual representation model, which emphasizes the meaning of words occurring in the document. As a result, the proposed text warehouse makes it possible to develop a semantic Naïve Bayes text classifier only by executing appropriate SQL statements.

2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


2014 ◽  
Vol 48 (1) ◽  
pp. 42-42 ◽  
Author(s):  
Giacomo Berardi

2019 ◽  
Vol 9 (11) ◽  
pp. 2347 ◽  
Author(s):  
Hannah Kim ◽  
Young-Seob Jeong

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.


Author(s):  
Yasufumi Takama ◽  
◽  
Takuma Tonegawa

This paper proposes an interactive document clustering system, which is designed based on the concept of CMV (coordinated multiple views). An interactive document clustering is used by a user to obtain a set of document groups from a document collection in interactive manner. It is expected to be useful for various tasks such as text mining and document retrieval. As the result of document clustering consists of multiple objects such as clusters (document groups), documents, and words, each of those should be presented to users in different ways. Based on this consideration, the proposed system employs multiple views, each of which is designed for specific object such as document and keyword. A prototype system is implemented on TETDM (Total Environment for Text Data Mining), which is one of environments for developing text data mining tools. As it can provide the mechanism of coordination between modules, we decided to use it for developing the prototype system. The proposed system classifies information to be presented into 4 levels: clusters, document, bag of words, and word, each of which is displayed with different views. Experimental results with test participants show the effectiveness of the proposed system.


2014 ◽  
Vol 46 (3) ◽  
pp. 611-632 ◽  
Author(s):  
Zachary Greene ◽  
Matthias Haber

Theories often explain intraparty competition based on electoral conditions and intraparty rules. This article further opens this black box by considering intraparty statements of preferences. In particular, it predicts that intraparty preference heterogeneity increases after electoral losses, but that candidates deviating from the party’s median receive fewer intraparty votes. Party members grant candidates greater leeway to accommodate competing policy demands when in government. The study tests the hypotheses using a new database of party congress speeches from Germany and France, and uses automated text classification to estimate speakers’ relative preferences. The results demonstrate that speeches at party meetings provide valuable insights into actors’ preferences and intraparty politics. The article finds evidence of a complex relationship between the governing context, the economy and intraparty disagreement.


2021 ◽  
Vol 9 (09) ◽  
pp. 484-488
Author(s):  
Rajeev Tripathi ◽  

Problems and strategies for text classification have already been known for a long time. Theyre widely utilised by companies like Google and Yahoo for email spam screening, sentiment analysis of Twitter data, and automatic news categories in Google alerts. Were still working on getting the findings to be as accurate as possible. When dealing with large amounts of text data, however, the models performance and accuracy become a difficulty. The type of words utilised in the corpus and the type of features produced for classification have a big impact on the performance of a text classification model.


Sign in / Sign up

Export Citation Format

Share Document