A Comparative Study on Feature Selection of Text Categorization for Hidden Markov Models

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais341 ◽

2013 ◽

Author(s):

Kwan Yi ◽

Jamshid Beheshti

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Markov Models ◽

Hidden Markov ◽

Model Performance ◽

Document Representation ◽

Selection Methods ◽

Learning Models ◽

Text Feature ◽

Selection Of

In document representation for digitalized text, feature selection refers to the selection of the terms of representing a document and of distinguishing it from other documents. This study probes different feature selection methods for HMM learning models to explore how they affect the model performance, which is experimented in the context of text categorization task.Dans la représentation documentaire des textes numérisés, la sélection des caractéristiques se fonde sur la sélection des termes représentant et distinguant un document des autres documents. Cette étude examine différents modèles de sélection de caractéristiques pour les modèles d’apprentissage MMC, afin d’explorer comment ils affectent la performance du modèle, qui est observé dans le contexte de la tâche de catégorisation textuelle.

Download Full-text

A survey of feature selection methods for Gaussian mixture models and hidden Markov models

Artificial Intelligence Review ◽

10.1007/s10462-017-9581-3 ◽

2017 ◽

Vol 52 (3) ◽

pp. 1739-1779 ◽

Cited By ~ 7

Author(s):

Stephen Adams ◽

Peter A. Beling

Keyword(s):

Feature Selection ◽

Hidden Markov Models ◽

Mixture Models ◽

Markov Models ◽

Hidden Markov ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Selection Methods

Download Full-text

Maximum a Posteriori Approximation of Hidden Markov Models for Proportional Sequential Data Modeling With Simultaneous Feature Selection

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3071083 ◽

2021 ◽

pp. 1-12

Author(s):

Samr Ali ◽

Nizar Bouguila

Keyword(s):

Feature Selection ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Data Modeling ◽

Maximum A Posteriori ◽

Sequential Data ◽

A Posteriori

Download Full-text

Design of Text Categorization System Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1191 ◽

2012 ◽

Vol 532-533 ◽

pp. 1191-1195 ◽

Cited By ~ 1

Author(s):

Zhen Yan Liu ◽

Wei Ping Wang ◽

Yong Wang

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Extraction Methods ◽

Support Vector ◽

Text Representation ◽

Text Feature ◽

Categorization System ◽

Classifier Training

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.

Download Full-text

Feature Selection for Hidden Markov Models with Discrete Features

Advances in Intelligent Systems and Computing - Intelligent Systems and Applications ◽

10.1007/978-3-030-29516-5_7 ◽

2019 ◽

pp. 67-82

Author(s):

Stephen Adams ◽

Peter A. Beling

Keyword(s):

Feature Selection ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Selection For

Download Full-text

Competitive Particle Swarm Optimization for Multi-Category Text Feature Selection

Entropy ◽

10.3390/e21060602 ◽

2019 ◽

Vol 21 (6) ◽

pp. 602 ◽

Cited By ~ 2

Author(s):

Jaesung Lee ◽

Jaegyun Park ◽

Hae-Cheon Kim ◽

Dae-Won Kim

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Text Categorization ◽

Relative Effectiveness ◽

Search Process ◽

Feature Subset ◽

Evolutionary Search ◽

Swarm Optimization ◽

Text Feature ◽

Conventional Methods

Multi-label feature selection is an important task for text categorization. This is because it enables learning algorithms to focus on essential features that foreshadow relevant categories, thereby improving the accuracy of text categorization. Recent studies have considered the hybridization of evolutionary feature wrappers and filters to enhance the evolutionary search process. However, the relative effectiveness of feature subset searches of evolutionary and feature filter operators has not been considered. This results in degenerated final feature subsets. In this paper, we propose a novel hybridization approach based on competition between the operators. This enables the proposed algorithm to apply each operator selectively and modify the feature subset according to its relative effectiveness, unlike conventional methods. The experimental results on 16 text datasets verify that the proposed method is superior to conventional methods.

Download Full-text

Data-driven global-ranking local feature selection methods for text categorization

Expert Systems with Applications ◽

10.1016/j.eswa.2014.10.011 ◽

2015 ◽

Vol 42 (4) ◽

pp. 1941-1949 ◽

Cited By ~ 35

Author(s):

Roberto H.W. Pinheiro ◽

George D.C. Cavalcanti ◽

Tsang Ing Ren

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Local Feature ◽

Data Driven ◽

Selection Methods ◽

Global Ranking

Download Full-text

A Text Categorization Model Based on Hidden Markov Models

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais539 ◽

2013 ◽

Cited By ~ 1

Author(s):

Kwan Yi ◽

Jamshid Beheshti

Keyword(s):

Text Categorization ◽

Classification Scheme ◽

Markov Models ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Digital Documents ◽

Part Of Speech ◽

Speech Tagging ◽

Standard Library ◽

Categorization Model

The Hidden Markov model (HMM) has been successfully used for speech recognition, part of speech tagging, and pattern recognition. In this study, we apply the HMM to automatically categorize digital documents into a standard library classification scheme. In the proposed framework, A HMM-based system is viewed as a model to generate a list of words and each document is seen as. . .

Download Full-text

Streamflow-based evaluation of climate model sub-selection methods

Climatic Change ◽

10.1007/s10584-020-02854-8 ◽

2020 ◽

Vol 163 (3) ◽

pp. 1267-1285 ◽

Cited By ~ 2

Author(s):

Jens Kiesel ◽

Philipp Stanzel ◽

Harald Kling ◽

Nicola Fohrer ◽

Sonja C. Jähnig ◽

...

Keyword(s):

Climate Change ◽

Climate Model ◽

Historical Data ◽

Model Performance ◽

Climate Change Impacts ◽

Climate Impact ◽

Selection Method ◽

Danube River ◽

Selection Methods ◽

Selection Of

AbstractThe assessment of climate change and its impact relies on the ensemble of models available and/or sub-selected. However, an assessment of the validity of simulated climate change impacts is not straightforward because historical data is commonly used for bias-adjustment, to select ensemble members or to define a baseline against which impacts are compared—and, naturally, there are no observations to evaluate future projections. We hypothesize that historical streamflow observations contain valuable information to investigate practices for the selection of model ensembles. The Danube River at Vienna is used as a case study, with EURO-CORDEX climate simulations driving the COSERO hydrological model. For each selection method, we compare observed to simulated streamflow shift from the reference period (1960–1989) to the evaluation period (1990–2014). Comparison against no selection shows that an informed selection of ensemble members improves the quantification of climate change impacts. However, the selection method matters, with model selection based on hindcasted climate or streamflow alone is misleading, while methods that maintain the diversity and information content of the full ensemble are favorable. Prior to carrying out climate impact assessments, we propose splitting the long-term historical data and using it to test climate model performance, sub-selection methods, and their agreement in reproducing the indicator of interest, which further provide the expectable benchmark of near- and far-future impact assessments. This test is well-suited to be applied in multi-basin experiments to obtain better understanding of uncertainty propagation and more universal recommendations regarding uncertainty reduction in hydrological impact studies.

Download Full-text

ACOUSTIC-PHONETIC DECODING OF SPANISH CONTINUOUS SPEECH

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001494000073 ◽

1994 ◽

Vol 08 (01) ◽

pp. 155-180 ◽

Cited By ~ 4

Author(s):

I. GALIANO ◽

E. SANCHIS ◽

F. CASACUBERTA ◽

I. TORRES

Keyword(s):

Markov Models ◽

Hidden Markov ◽

Recognition Rate ◽

Grammatical Inference ◽

Mathematical Framework ◽

Markov Modelling ◽

Unit Model ◽

Training Samples ◽

Speech Corpora ◽

Selection Of

The design of current acoustic-phonetic decoders for a specific language involves the selection of an adequate set of sublexical units, and a choice of the mathematical framework for modelling the corresponding units. In this work, the baseline chosen for continuous Spanish speech consists of 23 sublexical units that roughly correspond to the 24 Spanish phonemes. The process of selection of such a baseline was based on language phonetic criteria and some experiments with an available speech corpora. On the other hand, two types of models were chosen for this work, conventional Hidden Markov Models and Inferred Stochastic Regular Grammars. With these two choices we could compare classical Hidden Markov modelling where the structure of a unit-model is deductively supplied, with Grammatical Inference modelling where the baseforms of model-units are automatically generated from training samples. The best speaker-independent phone recognition rate was 64% for the first type of modelling, and 66% for the second type.

Download Full-text

Chinese Sentiment Classifier Machine Learning Based on Optimized Information Gain Feature Selection

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.988.511 ◽

2014 ◽

Vol 988 ◽

pp. 511-516 ◽

Cited By ~ 3

Author(s):

Jin Tao Shi ◽

Hui Liang Liu ◽

Yuan Xu ◽

Jun Feng Yan ◽

Jian Feng Xu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Word Frequency ◽

Chinese Text ◽

Information Gain ◽

Classification Performance ◽

Selection Methods ◽

Text Feature ◽

Important Solution ◽

Feature Word

Machine learning is important solution in the research of Chinese text sentiment categorization , the text feature selection is critical to the classification performance. However, the classical feature selection methods have better effect on the global categories, but it misses many representative feature words of each category. This paper presents an improved information gain method that integrates word frequency and degree of feature word sentiment into traditional information gain methods. Experiments show that classifier improved by this method has better classification .

Download Full-text