scholarly journals Feature Reduction of Short Text Classification by Using Bag of Words and Word Embedding

2019 ◽  
Vol 12 (2) ◽  
pp. 1-16
Author(s):  
Narongsak Chayangkoon ◽  
Anongnart Srivihok
2019 ◽  
Vol 9 (8) ◽  
pp. 1578 ◽  
Author(s):  
Li ◽  
Yin ◽  
Shi ◽  
Mao ◽  
Shi

One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces. Here, a feature reduction method is proposed that is based on two-stage feature clustering (TSFC), which is applied to short text classification. Features are semi-loosely clustered by combining spectral clustering with a graph traversal algorithm. Next, intra-cluster feature screening rules are designed to remove outlier feature words, which improves the effect of similar feature clusters. We classify short texts with corresponding similar feature clusters instead of original feature words. Similar feature clusters replace feature words, and the dimension of vector space is significantly reduced. Several classifiers are utilized to evaluate the effectiveness of this method. The results show that the method largely resolves the dimensional disaster and it can significantly improve the accuracy of short text classification.


2016 ◽  
Vol 174 ◽  
pp. 806-814 ◽  
Author(s):  
Peng Wang ◽  
Bo Xu ◽  
Jiaming Xu ◽  
Guanhua Tian ◽  
Cheng-Lin Liu ◽  
...  

2019 ◽  
Vol 15 (2) ◽  
pp. 155-182 ◽  
Author(s):  
Issa Alsmadi ◽  
Keng Hoon Gan

PurposeRapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.Design/methodology/approachThe paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.FindingsThis paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.Originality/valueUsing a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.


Sign in / Sign up

Export Citation Format

Share Document