Method of Feature Reduction in Short Text Classification Based on Feature Clustering

One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces. Here, a feature reduction method is proposed that is based on two-stage feature clustering (TSFC), which is applied to short text classification. Features are semi-loosely clustered by combining spectral clustering with a graph traversal algorithm. Next, intra-cluster feature screening rules are designed to remove outlier feature words, which improves the effect of similar feature clusters. We classify short texts with corresponding similar feature clusters instead of original feature words. Similar feature clusters replace feature words, and the dimension of vector space is significantly reduced. Several classifiers are utilized to evaluate the effectiveness of this method. The results show that the method largely resolves the dimensional disaster and it can significantly improve the accuracy of short text classification.

Download Full-text

Feature Reduction of Short Text Classification by Using Bag of Words and Word Embedding

International Journal of Control and Automation ◽

10.33832/ijca.2019.12.2.01 ◽

2019 ◽

Vol 12 (2) ◽

pp. 1-16

Author(s):

Narongsak Chayangkoon ◽

Anongnart Srivihok

Keyword(s):

Text Classification ◽

Feature Reduction ◽

Word Embedding ◽

Bag Of Words ◽

Short Text

Download Full-text

Attention-based Joint Representation Learning Network for Short text Classification

Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence ◽

10.1145/3404555.3404578 ◽

2020 ◽

Author(s):

Xinyue Liu ◽

Yexuan Tang

Keyword(s):

Text Classification ◽

Representation Learning ◽

Short Text ◽

Learning Network ◽

Joint Representation

Download Full-text

Mobile App Third-Party Library traffic discovery method based on short text classification and clustering

2020 International Conference on Information Science, Parallel and Distributed Systems (ISPDS) ◽

10.1109/ispds51347.2020.00024 ◽

2020 ◽

Author(s):

Yuanhao Li ◽

Shuhui Chen ◽

Shuang Zhao ◽

Lian Liu

Keyword(s):

Text Classification ◽

Mobile App ◽

Third Party ◽

Short Text ◽

Discovery Method ◽

Classification And Clustering

Download Full-text

An efficient stock market prediction model using hybrid feature reduction method based on variational autoencoders and recursive feature elimination

Financial Innovation ◽

10.1186/s40854-021-00243-3 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Hakan Gunduz

Keyword(s):

Reduction Method ◽

Feature Reduction ◽

The Other ◽

Recursive Feature Elimination ◽

Accuracy Rate ◽

Stock Performance ◽

Attention Model ◽

Feature Sets ◽

Accuracy Rates ◽

F Measure

AbstractIn this study, the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based, deep-learning (LSTM) and ensemble learning (LightGBM) models. These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics. While the first experiments directly used the own stock features as the model inputs, the second experiments utilized reduced stock features through Variational AutoEncoders (VAE). In the last experiments, in order to grasp the effects of the other banking stocks on individual stock performance, the features belonging to other stocks were also given as inputs to our models. While combining other stock features was done for both own (named as allstock_own) and VAE-reduced (named as allstock_VAE) stock features, the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination. As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model, the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675. Although the classification results achieved with both feature types was close, allstock_VAE achieved these results using nearly 16.67% less features compared to allstock_own. When all experimental results were examined, it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features. It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.

Download Full-text

Review of short-text classification

International Journal of Web Information Systems ◽

10.1108/ijwis-12-2017-0083 ◽

2019 ◽

Vol 15 (2) ◽

pp. 155-182 ◽

Cited By ~ 5

Author(s):

Issa Alsmadi ◽

Keng Hoon Gan

Keyword(s):

Social Networks ◽

Text Classification ◽

Development Trend ◽

Practical Reasons ◽

Significant Implication ◽

Classification Problems ◽

Content Type ◽

Short Text ◽

Promising Area ◽

Low Performance

PurposeRapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.Design/methodology/approachThe paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.FindingsThis paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.Originality/valueUsing a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Download Full-text

A New SVM Method for Short Text Classification Based on Semi-Supervised Learning

2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS) ◽

10.1109/aits.2015.34 ◽

2015 ◽

Cited By ~ 12

Author(s):

Chunyong Yin ◽

Jun Xiang ◽

Hui Zhang ◽

Jin Wang ◽

Zhichao Yin ◽

...

Keyword(s):

Supervised Learning ◽

Text Classification ◽

Short Text

Download Full-text

Incorporate Syntactic Information for Short Text Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.697 ◽

2011 ◽

Vol 268-270 ◽

pp. 697-700

Author(s):

Rui Xue Duan ◽

Xiao Jie Wang ◽

Wen Feng Li

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Environment ◽

Text Classification ◽

The Internet ◽

Selection Methods ◽

Text Documents ◽

Short Text ◽

Syntactic Information ◽

Dependency Relations

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.

Download Full-text