scholarly journals Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome

2021 ◽  
Author(s):  
Martín Pérez-Pérez ◽  
Tânia Ferreira ◽  
Anália Lourenço ◽  
Gilberto Igrejas ◽  
Florentino Fdez-Riverola
Author(s):  
Bing Tian ◽  
Yong Zhang ◽  
Jin Wang ◽  
Chunxiao Xing

Document classification is an essential task in many real world applications. Existing approaches adopt both text semantics and document structure to obtain the document representation. However, these models usually require a large collection of annotated training instances, which are not always feasible, especially in low-resource settings. In this paper, we propose a multi-task learning framework to jointly train multiple related document classification tasks. We devise a hierarchical architecture to make use of the shared knowledge from all tasks to enhance the document representation of each task. We further propose an inter-attention approach to improve the task-specific modeling of documents with global information. Experimental results on 15 public datasets demonstrate the benefits of our proposed model.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Jianming Zheng ◽  
Yupu Guo ◽  
Chong Feng ◽  
Honghui Chen

Document representation is widely used in practical application, for example, sentiment classification, text retrieval, and text classification. Previous work is mainly based on the statistics and the neural networks, which suffer from data sparsity and model interpretability, respectively. In this paper, we propose a general framework for document representation with a hierarchical architecture. In particular, we incorporate the hierarchical architecture into three traditional neural-network models for document representation, resulting in three hierarchical neural representation models for document classification, that is, TextHFT, TextHRNN, and TextHCNN. Our comprehensive experimental results on two public datasets, that is, Yelp 2016 and Amazon Reviews (Electronics), show that our proposals with hierarchical architecture outperform the corresponding neural-network models for document classification, resulting in a significant improvement ranging from 4.65% to 35.08% in terms of accuracy with a comparable (or substantially less) expense of time consumption. In addition, we find that the long documents benefit more from the hierarchical architecture than the short ones as the improvement in terms of accuracy on long documents is greater than that on short documents.


2019 ◽  
Vol 9 (4) ◽  
pp. 743 ◽  
Author(s):  
Sanda Martinčić-Ipšić ◽  
Tanja Miličić ◽  
and Todorovski

In this paper we perform a comparative analysis of three models for a feature representation of text documents in the context of document classification. In particular, we consider the most often used family of bag-of-words models, the recently proposed continuous space models word2vec and doc2vec, and the model based on the representation of text documents as language networks. While the bag-of-word models have been extensively used for the document classification task, the performance of the other two models for the same task have not been well understood. This is especially true for the network-based models that have been rarely considered for the representation of text documents for classification. In this study, we measure the performance of the document classifiers trained using the method of random forests for features generated with the three models and their variants. Multi-objective rankings are proposed as the framework for multi-criteria comparative analysis of the results. Finally, the results of the empirical comparison show that the commonly used bag-of-words model has a performance comparable to the one obtained by the emerging continuous-space model of doc2vec. In particular, the low-dimensional variants of doc2vec generating up to 75 features are among the top-performing document representation models. The results finally point out that doc2vec shows a superior performance in the tasks of classifying large documents.


2011 ◽  
Vol 131 (8) ◽  
pp. 1459-1466
Author(s):  
Yasunari Maeda ◽  
Hideki Yoshida ◽  
Masakiyo Suzuki ◽  
Toshiyasu Matsushima

2021 ◽  
Vol 212 ◽  
pp. 106597
Author(s):  
Wenlong Fu ◽  
Bing Xue ◽  
Xiaoying Gao ◽  
Mengjie Zhang

2019 ◽  
Vol 56 (5) ◽  
pp. 1618-1632 ◽  
Author(s):  
Zenun Kastrati ◽  
Ali Shariq Imran ◽  
Sule Yildirim Yayilgan

2020 ◽  
Vol 50 (12) ◽  
pp. 4602-4615
Author(s):  
Wei Wang ◽  
Bing Guo ◽  
Yan Shen ◽  
Han Yang ◽  
Yaosen Chen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document