Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering

2020 ◽  
Vol 514 ◽  
pp. 88-105 ◽  
Author(s):  
Massimo Esposito ◽  
Emanuele Damiano ◽  
Aniello Minutolo ◽  
Giuseppe De Pietro ◽  
Hamido Fujita
2020 ◽  
Vol 10 (12) ◽  
pp. 4316 ◽  
Author(s):  
Ivan Boban ◽  
Alen Doko ◽  
Sven Gotovac

Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.


Author(s):  
Miguel A. Silva-Fuentes ◽  
Hugo D. Calderon-Vilca ◽  
Edwin F. Calderon-Vilca ◽  
Flor C. Cardenas-Marino

Author(s):  
Kamal Al-Sabahi ◽  
Zhang Zuping

In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.


2008 ◽  
Vol 13 (4) ◽  
pp. 505-508 ◽  
Author(s):  
Keliang Jia ◽  
Xiuling Pang ◽  
Zhinuo Li ◽  
Xiaozhong Fan

Sign in / Sign up

Export Citation Format

Share Document