scholarly journals An ensemble information retrieval method for the biomedical domain (Preprint)

2021 ◽  
Author(s):  
Zhiqiang Liu ◽  
Jingkun Feng ◽  
Zhihao Yang ◽  
Lei Wang

BACKGROUND With the development of biomedicine, the number of biomedical documents has increased rapidly, which brings a great challenge for researchers retrieving the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in some specific retrieval tasks and thereby increases the difficulty of biomedical information retrieval. OBJECTIVE This study aims to find a more systematic method to retrieve relevant scientific literature for a given patient. METHODS In the initial retrieval stage, we supplement query terms through query expansion strategies and apply query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employ a text classification model and relevance matching model to evaluate documents respectively from different dimensions, then we combine the outputs through logistic regression to re-rank all the documents from the initial ranking list. RESULTS The proposed ensemble method contributes to the improvement of biomedical retrieval performance. Comparing with the existing deep learning-based methods, experimental results show that our method achieves state-of-the-art performance on the data collection provided by TREC 2019 Precision Medicine Track. CONCLUSIONS In this paper, we propose a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and the relevance matching model can better capture semantic context information and improve retrieval performance.

2018 ◽  
Vol 10 (11) ◽  
pp. 113 ◽  
Author(s):  
Yue Li ◽  
Xutao Wang ◽  
Pengjian Xu

Text classification is of importance in natural language processing, as the massive text information containing huge amounts of value needs to be classified into different categories for further use. In order to better classify text, our paper tries to build a deep learning model which achieves better classification results in Chinese text than those of other researchers’ models. After comparing different methods, long short-term memory (LSTM) and convolutional neural network (CNN) methods were selected as deep learning methods to classify Chinese text. LSTM is a special kind of recurrent neural network (RNN), which is capable of processing serialized information through its recurrent structure. By contrast, CNN has shown its ability to extract features from visual imagery. Therefore, two layers of LSTM and one layer of CNN were integrated to our new model: the BLSTM-C model (BLSTM stands for bi-directional long short-term memory while C stands for CNN.) LSTM was responsible for obtaining a sequence output based on past and future contexts, which was then input to the convolutional layer for extracting features. In our experiments, the proposed BLSTM-C model was evaluated in several ways. In the results, the model exhibited remarkable performance in text classification, especially in Chinese texts.


2020 ◽  
Vol 47 (11) ◽  
pp. 1054-1060
Author(s):  
Gihyeon Choi ◽  
Youngjin Jang ◽  
Harksoo Kim ◽  
Kwanwoo Kim

2021 ◽  
Vol 5 (3) ◽  
pp. 584-593
Author(s):  
Naufal Hilmiaji ◽  
Kemas Muslim Lhaksmana ◽  
Mahendra Dwifebri Purbolaksono

especially with the advancement of deep learning methods for text classification. Despite some effort to identify emotion on Indonesian tweets, its performance evaluation results have not achieved acceptable numbers. To solve this problem, this paper implements a classification model using a convolutional neural network (CNN), which has demonstrated expected performance in text classification. To easily compare with the previous research, this classification is performed on the same dataset, which consists of 4,403 tweets in Indonesian that were labeled using five different emotion classes: anger, fear, joy, love, and sadness. The performance evaluation results achieve the precision, recall, and F1-score at respectively 90.1%, 90.3%, and 90.2%, while the highest accuracy achieves 89.8%. These results outperform previous research that classifies the same classification on the same dataset.


2014 ◽  
Vol 977 ◽  
pp. 464-467
Author(s):  
Li Xin Gan ◽  
Wei Tu

Query expansion is one of the key technologies for improving precision and recall in information retrieval. In order to overcome limitations of single corpus, in this paper, semantic characteristics of Wikipedia corpus is combined with the standard corpus to extract more rich relationship between terms for construction of a steady Markov semantic network. Information of the entity pages and disambiguation pages in Wikipedia is comprehensively utilized to classify query terms to improve query classification accuracy. Related candidates with high quality can be used for query expansion according to semantic pruning. The proposal in our work is benefit to improve retrieval performance and to save search computational cost.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0247984
Author(s):  
Xuyang Wang ◽  
Yixuan Tong

With the rapid development of the mobile internet, people are becoming more dependent on the internet to express their comments on products or stores; meanwhile, text sentiment classification of these comments has become a research hotspot. In existing methods, it is fairly popular to apply a deep learning method to the text classification task. Aiming at solving information loss, weak context and other problems, this paper makes an improvement based on the transformer model to reduce the difficulty of model training and training time cost and achieve higher overall model recall and accuracy in text sentiment classification. The transformer model replaces the traditional convolutional neural network (CNN) and the recurrent neural network (RNN) and is fully based on the attention mechanism; therefore, the transformer model effectively improves the training speed and reduces training difficulty. This paper selects e-commerce reviews as research objects and applies deep learning theory. First, the text is preprocessed by word vectorization. Then the IN standardized method and the GELUs activation function are applied based on the original model to analyze the emotional tendencies of online users towards stores or products. The experimental results show that our method improves by 9.71%, 6.05%, 5.58% and 5.12% in terms of recall and approaches the peak level of the F1 value in the test model by comparing BiLSTM, Naive Bayesian Model, the serial BiLSTM_CNN model and BiLSTM with an attention mechanism model. Therefore, this finding proves that our method can be used to improve the text sentiment classification accuracy and effectively apply the method to text classification.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Ningfeng Sun ◽  
Chengye Du

This paper uses the database as the data source, using bibliometrics and visual analysis methods, to statistically analyze the relevant documents published in the field of text classification in the past ten years, to clarify the development context and research status of the text classification field, and to predict the research in the field of text classification priorities and research frontiers. Based on the in-depth study of the background, research status, related theories, and developments of online news text classification, this article analyzes the annual publication trend, subject distribution, journal distribution, institution distribution, author distribution, highly cited literature analysis, and research hotspots. Forefront and other aspects clarify the development context and research status of the text classification field and provide a theoretical reference for the further development of the text classification field. Then, on the basis of systematic research on text classification, deep learning, and news text classification theories, a deep learning-based network news text classification model is constructed, and the function of each module is introduced in detail, which will help the future news text classification of application and improvement provide theoretical basis. On the basis of the predecessors, this article separately studied and improved the neural network model based on the convolutional neural network, cyclic neural network, and attention mechanism and merged the three models into one model, which can obtain local associated features and contextual features and highlight the role of keywords. Finally, experiments are used to verify the effectiveness of the model proposed in this paper and compared with traditional text classification to prove the superiority of the network news text classification based on deep learning proposed in this paper. This article aims to study the internal connection between news comments and the number of votes received by news comments, and through the proposed model, the number of votes for news comments can be predicted.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Mohammed Ali Al-Garadi ◽  
Yuan-Chi Yang ◽  
Haitao Cai ◽  
Yucheng Ruan ◽  
Karen O’Connor ◽  
...  

Abstract Background Prescription medication (PM) misuse/abuse has emerged as a national crisis in the United States, and social media has been suggested as a potential resource for performing active monitoring. However, automating a social media-based monitoring system is challenging—requiring advanced natural language processing (NLP) and machine learning methods. In this paper, we describe the development and evaluation of automatic text classification models for detecting self-reports of PM abuse from Twitter. Methods We experimented with state-of-the-art bi-directional transformer-based language models, which utilize tweet-level representations that enable transfer learning (e.g., BERT, RoBERTa, XLNet, AlBERT, and DistilBERT), proposed fusion-based approaches, and compared the developed models with several traditional machine learning, including deep learning, approaches. Using a public dataset, we evaluated the performances of the classifiers on their abilities to classify the non-majority “abuse/misuse” class. Results Our proposed fusion-based model performs significantly better than the best traditional model (F1-score [95% CI]: 0.67 [0.64–0.69] vs. 0.45 [0.42–0.48]). We illustrate, via experimentation using varying training set sizes, that the transformer-based models are more stable and require less annotated data compared to the other models. The significant improvements achieved by our best-performing classification model over past approaches makes it suitable for automated continuous monitoring of nonmedical PM use from Twitter. Conclusions BERT, BERT-like and fusion-based models outperform traditional machine learning and deep learning models, achieving substantial improvements over many years of past research on the topic of prescription medication misuse/abuse classification from social media, which had been shown to be a complex task due to the unique ways in which information about nonmedical use is presented. Several challenges associated with the lack of context and the nature of social media language need to be overcome to further improve BERT and BERT-like models. These experimental driven challenges are represented as potential future research directions.


2014 ◽  
Vol 484-485 ◽  
pp. 183-186 ◽  
Author(s):  
Ji Ying Yang ◽  
Bei Zhang ◽  
Yu Mao

The core problem of information retrieval is concentrated in the document for the user to retrieve the most relevant sub-set of documents, relying on sorting algorithms on the search results according to relevance sort, sorted the results as the user asked the query response information retrieval performance is determined by many factors, such as to query expressions quality index stemmer nonsense word disabled, query expansion technology, but fundamentally it is determined by the sort function sort function in some Standards document query indicates the degree of matching with the user, and accordingly to make a document with respect to the user's judgment, then the document in accordance with the degree of correlation with respect to the user in descending order, and returns the ordered list of documents as a result of the retrieval the pros and cons of the sorting algorithm directly affect the efficiency of the retrieval.


Sign in / Sign up

Export Citation Format

Share Document