scholarly journals A text classification model constructed by Latent Dirichlet Allocation and Deep Learning

Author(s):  
Yu Liu ◽  
Zhengping Jin
2018 ◽  
Vol 10 (11) ◽  
pp. 113 ◽  
Author(s):  
Yue Li ◽  
Xutao Wang ◽  
Pengjian Xu

Text classification is of importance in natural language processing, as the massive text information containing huge amounts of value needs to be classified into different categories for further use. In order to better classify text, our paper tries to build a deep learning model which achieves better classification results in Chinese text than those of other researchers’ models. After comparing different methods, long short-term memory (LSTM) and convolutional neural network (CNN) methods were selected as deep learning methods to classify Chinese text. LSTM is a special kind of recurrent neural network (RNN), which is capable of processing serialized information through its recurrent structure. By contrast, CNN has shown its ability to extract features from visual imagery. Therefore, two layers of LSTM and one layer of CNN were integrated to our new model: the BLSTM-C model (BLSTM stands for bi-directional long short-term memory while C stands for CNN.) LSTM was responsible for obtaining a sequence output based on past and future contexts, which was then input to the convolutional layer for extracting features. In our experiments, the proposed BLSTM-C model was evaluated in several ways. In the results, the model exhibited remarkable performance in text classification, especially in Chinese texts.


2020 ◽  
Vol 47 (11) ◽  
pp. 1054-1060
Author(s):  
Gihyeon Choi ◽  
Youngjin Jang ◽  
Harksoo Kim ◽  
Kwanwoo Kim

2021 ◽  
Author(s):  
Zhiqiang Liu ◽  
Jingkun Feng ◽  
Zhihao Yang ◽  
Lei Wang

BACKGROUND With the development of biomedicine, the number of biomedical documents has increased rapidly, which brings a great challenge for researchers retrieving the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in some specific retrieval tasks and thereby increases the difficulty of biomedical information retrieval. OBJECTIVE This study aims to find a more systematic method to retrieve relevant scientific literature for a given patient. METHODS In the initial retrieval stage, we supplement query terms through query expansion strategies and apply query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employ a text classification model and relevance matching model to evaluate documents respectively from different dimensions, then we combine the outputs through logistic regression to re-rank all the documents from the initial ranking list. RESULTS The proposed ensemble method contributes to the improvement of biomedical retrieval performance. Comparing with the existing deep learning-based methods, experimental results show that our method achieves state-of-the-art performance on the data collection provided by TREC 2019 Precision Medicine Track. CONCLUSIONS In this paper, we propose a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and the relevance matching model can better capture semantic context information and improve retrieval performance.


2021 ◽  
Vol 5 (1) ◽  
pp. 123-131
Author(s):  
Ni Luh Putu Merawati Putu ◽  
Ahmad Zuli Amrullah ◽  
Ismarmiaty

Lombok Island is one of the favorite tourist destinations. Various topics and comments about Lombok tourism experience through social media accounts are difficult to manually identify public sentiments and topics. The opinion expressed by tourists through social media is interesting for further research. This study aims to classify tourists' opinions into two classes, positive and negative, and topics modelling by using the Naive Bayes method and modeling the topic by using Latent Dirichlet Allocation (LDA). The stages of this research include data collection, data cleaning, data transformation, data classification. The results performance testing of the classification model using Naive Bayes method is shown with an accuracy value of 92%, precision of 100%, recall of 84% and specificity of 100%. The results of modeling topics using LDA in each positive and negative class from the coherence value shows the highest value for the positive class was obtained on the 8th topic with a value of 0.613 and for the negative class on the 12th topic with a value of 0.528. The use of the Naive Bayes and LDA algorithms is considered effective for analyzing the sentiment and topic modelling for Lombok tourism.  


2021 ◽  
Vol 5 (3) ◽  
pp. 584-593
Author(s):  
Naufal Hilmiaji ◽  
Kemas Muslim Lhaksmana ◽  
Mahendra Dwifebri Purbolaksono

especially with the advancement of deep learning methods for text classification. Despite some effort to identify emotion on Indonesian tweets, its performance evaluation results have not achieved acceptable numbers. To solve this problem, this paper implements a classification model using a convolutional neural network (CNN), which has demonstrated expected performance in text classification. To easily compare with the previous research, this classification is performed on the same dataset, which consists of 4,403 tweets in Indonesian that were labeled using five different emotion classes: anger, fear, joy, love, and sadness. The performance evaluation results achieve the precision, recall, and F1-score at respectively 90.1%, 90.3%, and 90.2%, while the highest accuracy achieves 89.8%. These results outperform previous research that classifies the same classification on the same dataset.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0247984
Author(s):  
Xuyang Wang ◽  
Yixuan Tong

With the rapid development of the mobile internet, people are becoming more dependent on the internet to express their comments on products or stores; meanwhile, text sentiment classification of these comments has become a research hotspot. In existing methods, it is fairly popular to apply a deep learning method to the text classification task. Aiming at solving information loss, weak context and other problems, this paper makes an improvement based on the transformer model to reduce the difficulty of model training and training time cost and achieve higher overall model recall and accuracy in text sentiment classification. The transformer model replaces the traditional convolutional neural network (CNN) and the recurrent neural network (RNN) and is fully based on the attention mechanism; therefore, the transformer model effectively improves the training speed and reduces training difficulty. This paper selects e-commerce reviews as research objects and applies deep learning theory. First, the text is preprocessed by word vectorization. Then the IN standardized method and the GELUs activation function are applied based on the original model to analyze the emotional tendencies of online users towards stores or products. The experimental results show that our method improves by 9.71%, 6.05%, 5.58% and 5.12% in terms of recall and approaches the peak level of the F1 value in the test model by comparing BiLSTM, Naive Bayesian Model, the serial BiLSTM_CNN model and BiLSTM with an attention mechanism model. Therefore, this finding proves that our method can be used to improve the text sentiment classification accuracy and effectively apply the method to text classification.


Author(s):  
Bambang Subeno ◽  
Retno Kusumaningrum ◽  
Farikhin Farikhin

<span lang="EN-GB">Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by the number of predefined topics. If conducted incorrectly, determining the amount of K topics will result in limited word correlation with topics. Too large or too small number of K topics causes inaccuracies in grouping topics in the formation of training models. This study aims to determine the optimal number of corpus topics in the LDA method using the maximum likelihood and Minimum Description Length (MDL) approach. The experimental process uses Indonesian news articles with the number of documents at 25, 50, 90, and 600; in each document, the numbers of words are 3898, 7760, 13005, and 4365. The results show that the maximum likelihood and MDL approach result in the same number of optimal topics. The optimal number of topics is influenced by alpha and beta parameters. In addition, the number of documents does not affect the computation times but the number of words does. Computational times for each of those datasets are 2.9721, 6.49637, 13.2967, and 3.7152 seconds. The optimisation model has resulted in many LDA topics as a classification model. This experiment shows that the highest average accuracy is 61% with alpha 0.1 and beta 0.001.</span>


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Ningfeng Sun ◽  
Chengye Du

This paper uses the database as the data source, using bibliometrics and visual analysis methods, to statistically analyze the relevant documents published in the field of text classification in the past ten years, to clarify the development context and research status of the text classification field, and to predict the research in the field of text classification priorities and research frontiers. Based on the in-depth study of the background, research status, related theories, and developments of online news text classification, this article analyzes the annual publication trend, subject distribution, journal distribution, institution distribution, author distribution, highly cited literature analysis, and research hotspots. Forefront and other aspects clarify the development context and research status of the text classification field and provide a theoretical reference for the further development of the text classification field. Then, on the basis of systematic research on text classification, deep learning, and news text classification theories, a deep learning-based network news text classification model is constructed, and the function of each module is introduced in detail, which will help the future news text classification of application and improvement provide theoretical basis. On the basis of the predecessors, this article separately studied and improved the neural network model based on the convolutional neural network, cyclic neural network, and attention mechanism and merged the three models into one model, which can obtain local associated features and contextual features and highlight the role of keywords. Finally, experiments are used to verify the effectiveness of the model proposed in this paper and compared with traditional text classification to prove the superiority of the network news text classification based on deep learning proposed in this paper. This article aims to study the internal connection between news comments and the number of votes received by news comments, and through the proposed model, the number of votes for news comments can be predicted.


Sign in / Sign up

Export Citation Format

Share Document