Passage retrieval vs. document retrieval for factoid question answering

Abstract In this paper, we present our approach to improve the performance of open-domain Arabic Question Answering systems. We focus on the passage retrieval phase which aims to retrieve the most related passages to the correct answer. To extract passages that are related to the question, the system passes through three phases: Question Analysis, Document Retrieval and Passage Retrieval. We define the passage as the sentence that ends with a dot ".". In the Question Processing phase, we applied the traditional NLP steps of tokenization, stopwords and unrelated symbols removal, and replacing the question words with their stems. We also applied Query Expansion by adding synonyms to the question words. In the Document Retrieval phase, we used the Vector Space Model (VSM) with TF-IDF vectorizer and cosine similarity. For the Passage Retrieval phase, which is the core of our system, we measured the similarity between passages and the question by a combination of the BM25 ranker and Word Embedding approach. We tested our system on ACRD dataset, which contains 1395 questions in different domains, and the system was able to achieve correct results with a precision of 92.2% and recall of 79.9% in finding the top-3 related passages for the query.

Download Full-text

Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering

Reading and Learning - Lecture Notes in Computer Science ◽

10.1007/978-3-540-24642-8_17 ◽

2004 ◽

pp. 306-327 ◽

Cited By ~ 7

Author(s):

Koichi Kise ◽

Markus Junker ◽

Andreas Dengel ◽

Keinosuke Matsumoto

Keyword(s):

Question Answering ◽

Document Retrieval ◽

Passage Retrieval ◽

Density Distributions

Download Full-text

A Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain

MultiMedia Modeling - Lecture Notes in Computer Science ◽

10.1007/978-3-319-73603-7_1 ◽

2018 ◽

pp. 3-15 ◽

Cited By ~ 2

Author(s):

Shurong Sheng ◽

Aparna Nurani Venkitasubramanian ◽

Marie-Francine Moens

Keyword(s):

Cultural Heritage ◽

Question Answering ◽

Retrieval Method ◽

Markov Network ◽

Passage Retrieval

Download Full-text

A Passage Retrieval System for Multilingual Question Answering

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/11551874_57 ◽

2005 ◽

pp. 443-450 ◽

Cited By ~ 13

Author(s):

José Manuel Gómez Soriano ◽

Manuel Montes y Gómez ◽

Emilio Sanchis Arnal ◽

Paolo Rosso

Keyword(s):

Question Answering ◽

Retrieval System ◽

Passage Retrieval

Download Full-text

Passage Retrieval vs. Document Retrieval in the CLEF 2006 Ad Hoc Monolingual Tasks with the IR-n System

Evaluation of Multilingual and Multi-modal Information Retrieval - Lecture Notes in Computer Science ◽

10.1007/978-3-540-74999-8_8 ◽

2007 ◽

pp. 62-65

Author(s):

Elisa Noguera ◽

Fernando Llopis

Keyword(s):

Ad Hoc ◽

Document Retrieval ◽

Passage Retrieval

Download Full-text

Improving Sentence Retrieval Using Sequence Similarity

Applied Sciences ◽

10.3390/app10124316 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4316 ◽

Cited By ~ 1

Author(s):

Ivan Boban ◽

Alen Doko ◽

Sven Gotovac

Keyword(s):

Question Answering ◽

Sequence Similarity ◽

Novelty Detection ◽

Document Retrieval ◽

Language Modeling ◽

Information Need ◽

Partial Matching ◽

Retrieval Technique ◽

Sentence Retrieval ◽

Using Data

Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.

Download Full-text

A Persian Medical Question Answering System

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020500190 ◽

2020 ◽

Vol 29 (06) ◽

pp. 2050019

Author(s):

Hadi Veisi ◽

Hamed Fakour Shandi

Keyword(s):

Natural Language ◽

Language Processing ◽

Question Answering ◽

Document Retrieval ◽

Main Concept ◽

Question Answering System ◽

Detection Algorithms ◽

Persian Language ◽

Part Of Speech ◽

Answer Extraction

A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.

Download Full-text

O ONTOLOGY-BASED PARAGRAPH EXTRACTION AND CAUSALITY DETECTION-BASED SIMILARITY FOR ANSWERING WHY-QUESTION

Jurnal Ilmu Komputer ◽

10.24843/jik.2018.v11.i01.p02 ◽

2018 ◽

Vol 11 (1) ◽

pp. 9

Author(s):

A A I N Eka Karyawati

Keyword(s):

Knowledge Base ◽

Main Part ◽

Question Answering ◽

Document Retrieval ◽

Domain Ontology ◽

Typical Problem ◽

Scoring Method ◽

Method Performance ◽

Question Answering System ◽

Selection For

Paragraph extraction is a main part of an automatic question answering system, especially in answering why-question. It is because the answer of a why-question usually contained in one paragraph instead of one or two sentences. There have been some researches on paragraph extraction approaches, but there are still few studies focusing on involving the domain ontology as a knowledge base. Most of the paragraph extraction studies used keyword-based method with small portion of semantic approaches. Thus, the question answering system faces a typical problem often occuring in keyword-based method that is word mismatches problem. The main contribution of this research is a paragraph scoring method that incorporates the TFIDF-based and causality-detection-based similarity. This research is a part of the ontology-based why-question answering method, where ontology is used as a knowledge base for each steps of the method including indexing, question analyzing, document retrieval, and paragraph extraction/selection. For measuring the method performance, the evaluations were conducted by comparing the proposed method over two baselines methods that did not use causality-detection-based similarity. The proposed method shown improvements over the baseline methods regarding MRR (95%, 0.82-0.42), P@1 (105%, 0.78-0.38), P@5(91%, 0.88-0.46), Precision (95%, 0.80-0.41), and Recall (66%, 0.88-0.53).

Download Full-text

Interactive and Bilingual Question Answering Using Term Suggestion and Passage Retrieval

Multilingual Information Access for Text, Speech and Images - Lecture Notes in Computer Science ◽

10.1007/11519645_37 ◽

2005 ◽

pp. 363-370 ◽

Cited By ~ 1

Author(s):

Carlos G. Figuerola ◽

Angel F. Zazo ◽

José L. Alonso Berrocal ◽

Emilio Rodríguez Vázquez de Aldana

Keyword(s):

Question Answering ◽

Passage Retrieval

Download Full-text

Question Answering

10.1093/oxfordhb/9780199276349.013.0031 ◽

2012 ◽

Cited By ~ 1

Author(s):

Sanda Harabagiu ◽

Dan Moldovan

Keyword(s):

Language Processing ◽

Question Answering ◽

Document Retrieval ◽

Semantic Features ◽

Knowledge Based ◽

Answer Extraction ◽

Small Set ◽

Processing Module ◽

On Line ◽

Processing Techniques

Textual Question Answering (QA) identifies the answer to a question in large collections of on-line documents. By providing a small set of exact answers to questions, QA takes a step closer to information retrieval rather than document retrieval. A QA system comprises three modules: a question-processing module, a document-processing module, and an answer extraction and formulation module. Questions may be asked about any topic, in contrast with Information Extraction (IE), which identifies textual information relevant only to a predefined set of events and entities. The natural language processing (NLP) techniques used in open-domain QA systems may range from simple lexical and semantic disambiguation of question stems to complex processing that combines syntactic and semantic features of the questions with pragmatic information derived from the context of candidate answers. This article reviews current research in integrating knowledge-based NLP methods with shallow processing techniques for QA.

Download Full-text