A Study on Topic Modeling for Feature Space Reduction in Text Classification

Multi-label text classification aims at assigning more than one class to a given text document, which makes the task more ambiguous and challenging at the same time. The ambiguities come from the fact that often several labels in the prescribed label set are semantically close to each other, making clear demarcation between them difficult. As a consequence, any Machine Learning based approach for developing multi-label classification scheme needs to define its feature space by choosing features beyond linguistic or semi-linguistic features, so that the semantic closeness between the labels is also taken into account. The present work describes a scheme of feature extraction where the training document set and the prescribed label set are intertwined in a novel way to capture the ambiguity in a meaningful way. In particular, experiments were conducted using Topic Modeling and Fuzzy C-means clustering which aim at measuring the underlying uncertainty using probability and membership based measures, respectively. Several Nonparametric hypothesis tests establish the effectiveness of the features obtained through Fuzzy C-Means clustering in multi-label classification. A new algorithm has been proposed for training the system for multi-label classification using the above set of features.

Download Full-text

Multi-label dataless text classification with topic modeling

Knowledge and Information Systems ◽

10.1007/s10115-018-1280-0 ◽

2018 ◽

Vol 61 (1) ◽

pp. 137-160 ◽

Cited By ~ 4

Author(s):

Daochen Zha ◽

Chenliang Li

Keyword(s):

Text Classification ◽

Topic Modeling

Download Full-text

Feature Space Reduction for Human Activity Recognition based on Multi-channel Biosignals

Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies ◽

10.5220/0010260802150222 ◽

2021 ◽

Author(s):

Yale Hartmann ◽

Hui Liu ◽

Tanja Schultz

Keyword(s):

Activity Recognition ◽

Human Activity ◽

Feature Space ◽

Human Activity Recognition ◽

Space Reduction

Download Full-text

Supervised Feature Space Reduction for Multi-Label Nearest Neighbors

Advances in Artificial Intelligence: From Theory to Practice - Lecture Notes in Computer Science ◽

10.1007/978-3-319-60042-0_21 ◽

2017 ◽

pp. 182-191

Author(s):

Wissam Siblini ◽

Reda Alami ◽

Frank Meyer ◽

Pascale Kuntz

Keyword(s):

Feature Space ◽

Nearest Neighbors ◽

Space Reduction

Download Full-text

Research on Digital Forensics Based on Uyghur Web Text Classification

Cyber Warfare and Terrorism ◽

10.4018/978-1-7998-2466-4.ch093 ◽

2020 ◽

pp. 1586-1597

Author(s):

Yasen Aizezi ◽

Anwar Jamal ◽

Ruxianguli Abudurexiti ◽

Mutalipu Muming

Keyword(s):

Mutual Information ◽

Text Classification ◽

Text Categorization ◽

Digital Forensics ◽

Feature Space ◽

Experimental Result ◽

Support Vector ◽

Web Documents ◽

Normalized Mutual Information ◽

Plain Text

This paper mainly discusses the use of mutual information (MI) and Support Vector Machines (SVMs) for Uyghur Web text classification and digital forensics process of web text categorization: automatic classification and identification, conversion and pretreatment of plain text based on encoding features of various existing Uyghur Web documents etc., introduces the pre-paratory work for Uyghur Web text encoding. Focusing on the non-Uyghur characters and stop words in the web texts filtering, we put forward a Multi-feature Space Normalized Mutual Information (M-FNMI) algorithm and replace MI between single feature and category with mutual information (MI) between input feature combination and category so as to extract more accurate feature words; finally, we classify features with support vector machine (SVM) algorithm. The experimental result shows that this scheme has a high precision of classification and can provide criterion for digital forensics with specific purpose.

Download Full-text

Research on Digital Forensics Based on Uyghur Web Text Classification

Digital Forensics and Forensic Investigations ◽

10.4018/978-1-7998-3025-2.ch032 ◽

2020 ◽

pp. 485-496

Author(s):

Yasen Aizezi ◽

Anwar Jamal ◽

Ruxianguli Abudurexiti ◽

Mutalipu Muming

Keyword(s):

Mutual Information ◽

Text Classification ◽

Text Categorization ◽

Digital Forensics ◽

Feature Space ◽

Experimental Result ◽

Support Vector ◽

Web Documents ◽

Normalized Mutual Information ◽

Plain Text

This paper mainly discusses the use of mutual information (MI) and Support Vector Machines (SVMs) for Uyghur Web text classification and digital forensics process of web text categorization: automatic classification and identification, conversion and pretreatment of plain text based on encoding features of various existing Uyghur Web documents etc., introduces the pre-paratory work for Uyghur Web text encoding. Focusing on the non-Uyghur characters and stop words in the web texts filtering, we put forward a Multi-feature Space Normalized Mutual Information (M-FNMI) algorithm and replace MI between single feature and category with mutual information (MI) between input feature combination and category so as to extract more accurate feature words; finally, we classify features with support vector machine (SVM) algorithm. The experimental result shows that this scheme has a high precision of classification and can provide criterion for digital forensics with specific purpose.

Download Full-text

Deep Classifier for News Text Classification Using Topic Modeling Approach

10.1007/978-981-16-3071-2_13 ◽

2021 ◽

pp. 139-147

Author(s):

Megha Singla ◽

Maitreyee Dutta

Keyword(s):

Text Classification ◽

Topic Modeling ◽

Modeling Approach

Download Full-text

A New Feature Selection Method for Text Classification Based on Independent Feature Space Search

Mathematical Problems in Engineering ◽

10.1155/2020/6076272 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14 ◽

Cited By ~ 3

Author(s):

Yong Liu ◽

Shenggen Ju ◽

Junfeng Wang ◽

Chong Su

Keyword(s):

Feature Selection ◽

Text Classification ◽

Predictive Accuracy ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

The Other ◽

Feature Subset ◽

Search Range ◽

Text Documents

Feature selection method is designed to select the representative feature subsets from the original feature set by different evaluation of feature relevance, which focuses on reducing the dimension of the features while maintaining the predictive accuracy of a classifier. In this study, we propose a feature selection method for text classification based on independent feature space search. Firstly, a relative document-term frequency difference (RDTFD) method is proposed to divide the features in all text documents into two independent feature sets according to the features’ ability to discriminate the positive and negative samples, which has two important functions: one is to improve the high class correlation of the features and reduce the correlation between the features and the other is to reduce the search range of feature space and maintain appropriate feature redundancy. Secondly, the feature search strategy is used to search the optimal feature subset in independent feature space, which can improve the performance of text classification. Finally, we evaluate several experiments conduced on six benchmark corpora, the experimental results show the RDTFD method based on independent feature space search is more robust than the other feature selection methods.

Download Full-text