Context-Dependent Feature Values in Text Categorization

Feature engineering is one aspect of knowledge engineering. Besides feature selection, the appropriate assignment of feature values is also crucial to the performance of many software applications, such as text categorization (TC) and speech recognition. In this work, we develop a general method to enhance TC performance by the use of context-dependent feature values (aka term weights), which are obtained by a novel adaptation of a context-dependent adjustment procedure previously shown to be effective in information retrieval. The motivation of our approach is that the general method can be used with different text representations and in combination of other TC techniques. Experiments on several test collections show that our context-dependent feature values can improve TC over traditional context-independent unigram feature values, using a strong classifier like Support Vector Machine (SVM), which past works have found to be hard to improve. We also show that the relative performance improvement of our method over the context-independent baseline is comparable to the levels attained by recent word embedding methods in the literature, while an advantage of our approach is that it does not require the substantial training needed to learn word embedding representations.

Download Full-text

Algorithm of Text Categorization Based on Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.311.158 ◽

2013 ◽

Vol 311 ◽

pp. 158-163 ◽

Cited By ~ 1

Author(s):

Li Qin Huang ◽

Li Qun Lin ◽

Yan Huang Liu

Keyword(s):

Cloud Computing ◽

Text Categorization ◽

Experimental Results ◽

Support Vector ◽

Computing Environment ◽

Mapreduce Framework ◽

Cloud Computing Environment ◽

Environment Map ◽

Vector Machines ◽

Parallel Text

MapReduce framework of cloud computing has an effective way to achieve massive text categorization. In this paper a distributed parallel text training algorithm in cloud computing environment based on multi-class Support Vector Machines(SVM) is designed. In cloud computing environment Map tasks realize distributing various types of samples and Reduce tasks realize the specific SVM training. Experimental results show that the execution time of text training decreases with the number of Reduce tasks increasing. Also a parallel text classifying based on cloud computing is designed and implemented, which classify the unknown type texts. Experimental results show that the speed of text classifying increases with the number of Map tasks increasing.

Download Full-text

A Method Based on Support Vector Machine for Feature Selection of Latent Semantic Features

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.181-182.830 ◽

2011 ◽

Vol 181-182 ◽

pp. 830-835

Author(s):

Min Song Li

Keyword(s):

Support Vector Machine ◽

Text Categorization ◽

Latent Semantic Indexing ◽

Classification Performance ◽

Compact Representation ◽

Support Vector ◽

Semantic Features ◽

Semantic Indexing ◽

Feature Extraction Method ◽

Feature Subspace

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

Download Full-text

Solving multi-label text categorization problem using support vector machine approach with membership function

Neurocomputing ◽

10.1016/j.neucom.2011.07.001 ◽

2011 ◽

Vol 74 (17) ◽

pp. 3682-3689 ◽

Cited By ~ 15

Author(s):

Tai-Yue Wang ◽

Huei-Min Chiang

Keyword(s):

Support Vector Machine ◽

Membership Function ◽

Text Categorization ◽

Support Vector

Download Full-text

Design of Text Categorization System Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1191 ◽

2012 ◽

Vol 532-533 ◽

pp. 1191-1195 ◽

Cited By ~ 1

Author(s):

Zhen Yan Liu ◽

Wei Ping Wang ◽

Yong Wang

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Extraction Methods ◽

Support Vector ◽

Text Representation ◽

Text Feature ◽

Categorization System ◽

Classifier Training

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.

Download Full-text

Support vector machines for text categorization

36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the ◽

10.1109/hicss.2003.1174243 ◽

2003 ◽

Cited By ~ 45

Author(s):

A. Basu ◽

C. Walters ◽

M. Shepherd

Keyword(s):

Support Vector Machines ◽

Text Categorization ◽

Support Vector ◽

Vector Machines

Download Full-text

Italian Text Categorization with Lemmatization and Support Vector Machines

Neural Approaches to Dynamics of Signal Exchanges - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-13-8950-4_5 ◽

2019 ◽

pp. 47-54 ◽

Cited By ~ 1

Author(s):

Francesco Camastra ◽

Gennaro Razi

Keyword(s):

Support Vector Machines ◽

Italian Text ◽

Text Categorization ◽

Support Vector ◽

Vector Machines

Download Full-text

Comparative Analysis for Topic Classification in Juz Al-Baqarah

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i1.pp406-411 ◽

2018 ◽

Vol 12 (1) ◽

pp. 406

Author(s):

Mohamad Izzuddin Rahman ◽

Noor Azah Samsudin ◽

Aida Mustapha ◽

Adeleke Abdullahi

Keyword(s):

Text Mining ◽

Text Categorization ◽

Arabic Language ◽

Support Vector ◽

Original Text ◽

Research Project ◽

Computational Environment ◽

Nearest Neighbours ◽

Association Discovery ◽

Relationship Of

<p>In Islam, Quran is the holy book that was revealed to the Prophet Muhammad. It functions as complete code of life for the Muslims. Remarks from Allah which contains more than 77,000 words that was passed down through Prophet Muhammad to the mankind for 23 years started in 610 ce. The Quran was divided into 114 chapters. Arabic language is the original text. The need for the Muslims across the world to find the meaning to understand the content in the Quran is necessary. Nevertheless, understanding the Quran is an interest for the Muslims as well as the attention of millions of people from the faiths. Following the generation, lots of content that related to the Quran has been broadcast by Muslims scholars in the way of the tafsirs, translation and the book of hadiths. Problem has happened at current is most Muslim in Malaysia do not understand sentences in the Quran due to language barrier. The purpose of this research is classified topic in each verses of the Quran sentence based on its specific theme. It involves the objective of text mining which are based on linguistic information and domain. The usage of corpus helps to perform various data mining tasks including information extraction, text categorization, the relationship of concepts, association discovery, the evaluation of pattern and assessed. This research project is aiming to create computing environment that enable us use to text mining the Quran. The classification experiment is using the Support Vector Machine to find themes in Juz’ Baqarah. The SVM performance is then compared against other classification algorithms such as Naive Bayes, J48 Decision Tree and K-Nearest Neighbours. This research project aims at creating an enabling computational environment for text mining the Qur’an and to facilitate users to understand every verse in Juz’ Baqarah.</p>

Download Full-text

Categorizing Natural Language-Based Customer Satisfaction: An Implementation Method Using Support Vector Machine and Long Short-Term Memory Neural Network

International Journal of Integrated Engineering ◽

10.30880/ijie.2021.13.04.007 ◽

2021 ◽

Vol 13 (4) ◽

Author(s):

Ralph Sherwin A. Corpuz ◽

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Natural Language ◽

Text Categorization ◽

Short Term Memory ◽

Support Vector ◽

Feature Engineering ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Analyzing natural language-based Customer Satisfaction (CS) is a tedious process. This issue is practically true if one is to manually categorize large datasets. Fortunately, the advent of supervised machine learning techniques has paved the way toward the design of efficient categorization systems used for CS. This paper presents the feasibility of designing a text categorization model using two popular and robust algorithms – the Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) Neural Network, in order to automatically categorize complaints, suggestions, feedbacks, and commendations. The study found that, in terms of training accuracy, SVM has best rating of 98.63% while LSTM has best rating of 99.32%. Such results mean that both SVM and LSTM algorithms are at par with each other in terms of training accuracy, but SVM is significantly faster than LSTM by approximately 35.47s. The training performance results of both algorithms are attributed on the limitations of the dataset size, high-dimensionality of both English and Tagalog languages, and applicability of the feature engineering techniques used. Interestingly, based on the results of actual implementation, both algorithms are found to be 100% effective in accurately predicting the correct CS categories. Hence, the extent of preference between the two algorithms boils down on the available dataset and the skill in optimizing these algorithms through feature engineering techniques and in implementing them toward actual text categorization applications.

Download Full-text

Research on Digital Forensics Based on Uyghur Web Text Classification

Cyber Warfare and Terrorism ◽

10.4018/978-1-7998-2466-4.ch093 ◽

2020 ◽

pp. 1586-1597

Author(s):

Yasen Aizezi ◽

Anwar Jamal ◽

Ruxianguli Abudurexiti ◽

Mutalipu Muming

Keyword(s):

Mutual Information ◽

Text Classification ◽

Text Categorization ◽

Digital Forensics ◽

Feature Space ◽

Experimental Result ◽

Support Vector ◽

Web Documents ◽

Normalized Mutual Information ◽

Plain Text

This paper mainly discusses the use of mutual information (MI) and Support Vector Machines (SVMs) for Uyghur Web text classification and digital forensics process of web text categorization: automatic classification and identification, conversion and pretreatment of plain text based on encoding features of various existing Uyghur Web documents etc., introduces the pre-paratory work for Uyghur Web text encoding. Focusing on the non-Uyghur characters and stop words in the web texts filtering, we put forward a Multi-feature Space Normalized Mutual Information (M-FNMI) algorithm and replace MI between single feature and category with mutual information (MI) between input feature combination and category so as to extract more accurate feature words; finally, we classify features with support vector machine (SVM) algorithm. The experimental result shows that this scheme has a high precision of classification and can provide criterion for digital forensics with specific purpose.

Download Full-text