A New Method of Improving BERT for Text Classification

Author(s):  
Shaomin Zheng ◽  
Meng Yang
2019 ◽  
Vol 17 (2) ◽  
pp. 241-249
Author(s):  
Yangyang Li ◽  
Bo Liu

Short and sparse characteristics and synonyms and homonyms are main obstacles for short-text classification. In recent years, research on short-text classification has focused on expanding short texts but has barely guaranteed the validity of expanded words. This study proposes a new method to weaken these effects without external knowledge. The proposed method analyses short texts by using the topic model based on Latent Dirichlet Allocation (LDA), represents each short text by using a vector space model and presents a new method to adjust the vector of short texts. In the experiments, two open short-text data sets composed of google news and web search snippets are utilised to evaluate the classification performance and prove the effectiveness of our method.


2016 ◽  
Vol 49 ◽  
pp. 19-42 ◽  
Author(s):  
Jiamin Xu ◽  
Palaiahnakote Shivakumara ◽  
Tong Lu ◽  
Chew Lim Tan ◽  
Seiichi Uchida

Author(s):  
M. Ali Fauzi ◽  
Agus Zainal Arifin ◽  
Sonny Christiano Gosaria

Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.


A Similarity measure is a main process in the text processing method. we have proposed a new method of similarity measure which improved the performance of the K-NN method. The proposed measure extended the accuracy of the text classification method. we have implemented proposed method with Amazon dataset and we observed the effectiveness of proposed similarity measure is increase the accuracy of the similarity between set of documents in a corpus. The end result display performance achieved by way of the proposed measure is higher than that obtained by others.


2012 ◽  
Vol 524-527 ◽  
pp. 3866-3869
Author(s):  
Pei Ying Zhang

Text classification is the task of assigning natural language textual documents to predefined categories based on their context. The main concern in this paper is to improve the accuracy of text classification system combined an improved CHI method and category relevance factor. Firstly, use an improved CHI method to select features from the raw features aim to reduce the dimensions of the features. Secondly, through the TF-CRF method to calculate the feature weight, this method mainly consider that the features have different distributions in different categories. Finally, we carried out a series of experiments compared with other methods using the F1-measure. Experimental results show that our new method makes an important improvement in all categories.


2011 ◽  
Vol 50-51 ◽  
pp. 700-703 ◽  
Author(s):  
Wei Ran Lin ◽  
Zhi Hui Wu ◽  
Li Chao Feng ◽  
Wai Bin Huang

KNN algorithm is used for Chinese text classification in this paper. First, TF-IDF is chosen as the feature weighting method. To the characteristics of corpus used in this paper, TF-IDF is adjusted to a new method. At last, experimental result shows the accuracy of KNN text classifier can be improved with the adjusted feature weighting method.


2013 ◽  
Vol 411-414 ◽  
pp. 1112-1116 ◽  
Author(s):  
Wei An ◽  
Qi Hua Liu

This paper combines domain ontology and LDA model to propose a new method of hierarchical web text classification. Experimental results show that the method has good performance with high recall rate and accuracy rate.


Sign in / Sign up

Export Citation Format

Share Document