Multidimensional Text Warehousing for Automated Text Classification

Jiyun Kim; Han-joon Kim

doi:10.4018/jitr.2018040110

Multidimensional Text Warehousing for Automated Text Classification

Journal of Information Technology Research ◽

10.4018/jitr.2018040110 ◽

2018 ◽

Vol 11 (2) ◽

pp. 168-183

Author(s):

Jiyun Kim ◽

Han-joon Kim

Keyword(s):

Text Classification ◽

Document Retrieval ◽

Business Strategies ◽

Data Organization ◽

Order Tensor ◽

Text Data ◽

Data Manipulation ◽

Textual Representation ◽

Automated Text Classification ◽

Effective Analysis

This article describes how, in the era of big data, a data warehouse is an integrated multidimensional database that provides the basis for the decision making required to establish crucial business strategies. Efficient, effective analysis requires a data organization system that integrates and manages data of various dimensions. However, conventional data warehousing techniques do not consider the various data manipulation operations required for data-mining activities. With the current explosion of text data, much research has examined text (or document) repositories to support text mining and document retrieval. Therefore, this article presents a method of developing a text warehouse that provides a machine-learning-based text classification service. The document is represented as a term-by-concept matrix using a 3rd-order tensor-based textual representation model, which emphasizes the meaning of words occurring in the document. As a result, the proposed text warehouse makes it possible to develop a semantic Naïve Bayes text classifier only by executing appropriate SQL statements.

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

Semi-automated text classification

ACM SIGIR Forum ◽

10.1145/2641383.2641392 ◽

2014 ◽

Vol 48 (1) ◽

pp. 42-42 ◽

Cited By ~ 2

Author(s):

Giacomo Berardi

Keyword(s):

Text Classification ◽

Automated Text Classification

Download Full-text

A utility-theoretic ranking method for semi-automated text classification

Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12 ◽

10.1145/2348283.2348411 ◽

2012 ◽

Cited By ~ 7

Author(s):

Giacomo Berardi ◽

Andrea Esuli ◽

Fabrizio Sebastiani

Keyword(s):

Text Classification ◽

Ranking Method ◽

Automated Text Classification

Download Full-text

Automated Text Classification for Fast Feedback – Investigating the Effects of Document Representation

Lecture Notes in Computer Science - Knowledge-Based Intelligent Information and Engineering Systems ◽

10.1007/978-3-540-45226-3_138 ◽

2003 ◽

pp. 1008-1014

Author(s):

Rakesh Menon ◽

Loh Han Tong ◽

S. Sathiyakeerthi ◽

Aarnout Brombacher

Keyword(s):

Text Classification ◽

Document Representation ◽

Automated Text Classification

Download Full-text

Sentiment Classification Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app9112347 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2347 ◽

Cited By ~ 18

Author(s):

Hannah Kim ◽

Young-Seob Jeong

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Text Classification ◽

State Of The Art ◽

Sentiment Classification ◽

Learning Models ◽

Text Data ◽

Textual Data ◽

Better Than

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.

Download Full-text

Interactive Document Clustering System Based on Coordinated Multiple Views

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0139 ◽

2016 ◽

Vol 20 (1) ◽

pp. 139-145 ◽

Cited By ~ 3

Author(s):

Yasufumi Takama ◽

◽

Takuma Tonegawa

Keyword(s):

Data Mining ◽

Document Clustering ◽

Document Retrieval ◽

Prototype System ◽

Multiple Views ◽

Multiple Objects ◽

Text Data ◽

Text Data Mining ◽

Specific Object ◽

Document Collection

This paper proposes an interactive document clustering system, which is designed based on the concept of CMV (coordinated multiple views). An interactive document clustering is used by a user to obtain a set of document groups from a document collection in interactive manner. It is expected to be useful for various tasks such as text mining and document retrieval. As the result of document clustering consists of multiple objects such as clusters (document groups), documents, and words, each of those should be presented to users in different ways. Based on this consideration, the proposed system employs multiple views, each of which is designed for specific object such as document and keyword. A prototype system is implemented on TETDM (Total Environment for Text Data Mining), which is one of environments for developing text data mining tools. As it can provide the mechanism of coordination between modules, we decided to use it for developing the prototype system. The proposed system classifies information to be presented into 4 levels: clusters, document, bag of words, and word, each of which is displayed with different views. Experimental results with test participants show the effectiveness of the proposed system.

Download Full-text

Automated text classification using a multi-agent framework

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL '05 ◽

10.1145/1065385.1065420 ◽

2005 ◽

Cited By ~ 3

Author(s):

Yueyu Fu ◽

Weimao Ke ◽

Javed Mostafa

Keyword(s):

Text Classification ◽

Multi Agent ◽

Automated Text Classification

Download Full-text

Leadership Competition and Disagreement at Party National Congresses

British Journal of Political Science ◽

10.1017/s0007123414000283 ◽

2014 ◽

Vol 46 (3) ◽

pp. 611-632 ◽

Cited By ~ 32

Author(s):

Zachary Greene ◽

Matthias Haber

Keyword(s):

Text Classification ◽

Black Box ◽

Preference Heterogeneity ◽

Complex Relationship ◽

Party Members ◽

Intraparty Politics ◽

Automated Text Classification ◽

Relative Preferences ◽

Party Congress

Theories often explain intraparty competition based on electoral conditions and intraparty rules. This article further opens this black box by considering intraparty statements of preferences. In particular, it predicts that intraparty preference heterogeneity increases after electoral losses, but that candidates deviating from the party’s median receive fewer intraparty votes. Party members grant candidates greater leeway to accommodate competing policy demands when in government. The study tests the hypotheses using a new database of party congress speeches from Germany and France, and uses automated text classification to estimate speakers’ relative preferences. The results demonstrate that speeches at party meetings provide valuable insights into actors’ preferences and intraparty politics. The article finds evidence of a complex relationship between the governing context, the economy and intraparty disagreement.

Download Full-text

Automated text classification using a dynamic artificial neural network model

Expert Systems with Applications ◽

10.1016/j.eswa.2012.03.027 ◽

2012 ◽

Vol 39 (12) ◽

pp. 10967-10976 ◽

Cited By ~ 37

Author(s):

M. Ghiassi ◽

M. Olschimke ◽

B. Moon ◽

P. Arnaudo

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Network Model ◽

Neural Network Model ◽

Text Classification ◽

Artificial Neural Network Model ◽

Artificial Neural ◽

Automated Text Classification

Download Full-text

PERFECTIONOF CLASSIFICATION ACCURACY IN TEXT CATEGORIZATION

International Journal of Advanced Research ◽

10.21474/ijar01/13437 ◽

2021 ◽

Vol 9 (09) ◽

pp. 484-488

Author(s):

Rajeev Tripathi ◽

Keyword(s):

Sentiment Analysis ◽

Text Classification ◽

Classification Accuracy ◽

Text Categorization ◽

Classification Model ◽

Text Data ◽

Twitter Data ◽

Long Time ◽

Google Alerts ◽

Email Spam

Problems and strategies for text classification have already been known for a long time. Theyre widely utilised by companies like Google and Yahoo for email spam screening, sentiment analysis of Twitter data, and automatic news categories in Google alerts. Were still working on getting the findings to be as accurate as possible. When dealing with large amounts of text data, however, the models performance and accuracy become a difficulty. The type of words utilised in the corpus and the type of features produced for classification have a big impact on the performance of a text classification model.

Download Full-text