Research of Support Vector Machine in Text Classification

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

A Hybrid Text Classification Method Based on K-Congener-Nearest-Neighbors and Hypersphere Support Vector Machine

2013 International Conference on Information Technology and Applications ◽

10.1109/ita.2013.120 ◽

2013 ◽

Cited By ~ 2

Author(s):

Y.H. Chen ◽

Y.F. Zheng ◽

J.F. Pan ◽

N. Yang

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Nearest Neighbors ◽

Classification Method ◽

Support Vector

Download Full-text

Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-formal Indonesian Conversation

10.21203/rs.3.rs-41431/v2 ◽

2020 ◽

Author(s):

Rianto Rianto ◽

Achmad Benny Mutiara ◽

Eri Prasetyo Wibowo ◽

Paulus Insap Santosa

Keyword(s):

Support Vector Machine ◽

Information Retrieval ◽

Text Classification ◽

Experimental Evaluation ◽

Hate Speech ◽

Text Processing ◽

High Accuracy ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Text Data

Abstract Stemming has long been used in data pre-processing in information retrieval, which aims to make affix words into root words. However, there are not many stemming methods for non-formal Indonesian text processing. The existing stemming method has high accuracy for formal Indonesian, but low for non-formal Indonesian. Thus, the stemming method which has high accuracy for non-formal Indonesian classifier model is still an open-ended challenge. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to provide comprehensive research on improving the accuracy of text classifier models by strengthening on stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. In the future, the proposed stemming method can be used to develop the Indonesian text classifier model which can be used for various purposes including text clustering, summarization, detecting hate speech, and other text processing applications.

Download Full-text

Progressive Similarity Transductive Support Vector Machine Algorithm for Small Sample Text Classification

Information Technology Journal ◽

10.3923/itj.2013.7673.7676 ◽

2013 ◽

Vol 12 (23) ◽

pp. 7673-7676

Author(s):

Jianbin Ma ◽

Ying Li

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Small Sample ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Transductive Support Vector Machine

Download Full-text

Application Research on the Text Classification Parameters Optimization based on the Support Vector Machine

Journal of Convergence Information Technology ◽

10.4156/jcit.vol8.issue5.83 ◽

2013 ◽

Vol 8 (5) ◽

pp. 717-724

Author(s):

ZHAO yuanqing

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Parameters Optimization ◽

Support Vector ◽

Application Research

Download Full-text

A new hypersphere multi-class support vector machine applied in text classification

2011 IEEE 3rd International Conference on Communication Software and Networks ◽

10.1109/iccsn.2011.6014314 ◽

2011 ◽

Author(s):

Sun Ai-xiang ◽

Huang Shun-liang ◽

Li Ming-hui ◽

Zhang Jun

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Support Vector

Download Full-text

An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms

Network and Communication Technology Innovations for Web and IT Advancement ◽

10.4018/978-1-4666-2157-2.ch012 ◽

2014 ◽

pp. 184-190

Author(s):

Bassam Al-Shargabi ◽

Fekry Olayah ◽

Waseem AL Romimah

Keyword(s):

Experimental Study ◽

Support Vector Machine ◽

Text Classification ◽

Error Rate ◽

Support Vector ◽

Arabic Text ◽

Sequential Minimal Optimization ◽

Split Method ◽

Arabic Text Classification ◽

Fold Cross Validation

In this paper, an experimental study was conducted on three techniques for Arabic text classification. These techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The paper assesses the accuracy for each classifier and determines which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for each classifier is measured by Percentage split method (holdout), and K-fold cross validation methods, along with the time needed to classify Arabic text. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is much lower compared to other classification techniques.

Download Full-text