Solving multi-label text categorization problem using support vector machine approach with membership function

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

Download Full-text

EEG Classification of Imaginary Lower Limb Stepping Movements Based on Fuzzy Support Vector Machine with Kernel-Induced Membership Function

International Journal of Fuzzy Systems ◽

10.1007/s40815-016-0259-9 ◽

2016 ◽

Vol 19 (2) ◽

pp. 566-579 ◽

Cited By ~ 19

Author(s):

Wei-Chun Hsu ◽

Li-Fong Lin ◽

Chun-Wei Chou ◽

Yu-Tsung Hsiao ◽

Yi-Hung Liu

Keyword(s):

Support Vector Machine ◽

Membership Function ◽

Lower Limb ◽

Support Vector ◽

Fuzzy Support Vector Machine ◽

Eeg Classification

Download Full-text

Categorizing Natural Language-Based Customer Satisfaction: An Implementation Method Using Support Vector Machine and Long Short-Term Memory Neural Network

International Journal of Integrated Engineering ◽

10.30880/ijie.2021.13.04.007 ◽

2021 ◽

Vol 13 (4) ◽

Author(s):

Ralph Sherwin A. Corpuz ◽

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Natural Language ◽

Text Categorization ◽

Short Term Memory ◽

Support Vector ◽

Feature Engineering ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Analyzing natural language-based Customer Satisfaction (CS) is a tedious process. This issue is practically true if one is to manually categorize large datasets. Fortunately, the advent of supervised machine learning techniques has paved the way toward the design of efficient categorization systems used for CS. This paper presents the feasibility of designing a text categorization model using two popular and robust algorithms – the Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) Neural Network, in order to automatically categorize complaints, suggestions, feedbacks, and commendations. The study found that, in terms of training accuracy, SVM has best rating of 98.63% while LSTM has best rating of 99.32%. Such results mean that both SVM and LSTM algorithms are at par with each other in terms of training accuracy, but SVM is significantly faster than LSTM by approximately 35.47s. The training performance results of both algorithms are attributed on the limitations of the dataset size, high-dimensionality of both English and Tagalog languages, and applicability of the feature engineering techniques used. Interestingly, based on the results of actual implementation, both algorithms are found to be 100% effective in accurately predicting the correct CS categories. Hence, the extent of preference between the two algorithms boils down on the available dataset and the skill in optimizing these algorithms through feature engineering techniques and in implementing them toward actual text categorization applications.

Download Full-text

Using the Text Categorization Framework for Protein Classification

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch008 ◽

2010 ◽

pp. 128-140 ◽

Cited By ~ 1

Author(s):

Ricco Rakotomalala ◽

Faouzi Mhamdi

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Predictive Model ◽

Text Categorization ◽

Learning Algorithm ◽

Support Vector ◽

Protein Classification ◽

Fixed Length ◽

Selection Algorithms ◽

Proteins Classification

In this chapter, we are interested in proteins classification starting from their primary structures. The goal is to automatically affect proteins sequences to their families. The main originality of the approach is that we directly apply the text categorization framework for the protein classification with very minor modifications. The main steps of the task are clearly identified: we must extract features from the unstructured dataset, we use the fixed length n-grams descriptors; we select and combine the most relevant one for the learning phase; and then, we select the most promising learning algorithm in order to produce accurate predictive model. We obtain essentially two main results. First, the approach is credible, giving accurate results with only 2-grams descriptors length. Second, in our context where many irrelevant descriptors are automatically generated, we must combine aggressive feature selection algorithms and low variance classifiers such as SVM (Support Vector Machine).

Download Full-text