Retrieving Relevant Passages Using N-grams for Open-Domain Question Answering

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.

Download Full-text

A new Passage Retrieval Method in Arabic Question Answering Systems

10.21203/rs.3.rs-119562/v1 ◽

2020 ◽

Author(s):

Lana Alsabbagh ◽

Oumayma AlDakkak ◽

Nada Ghneim

Keyword(s):

Query Expansion ◽

Question Answering ◽

Document Retrieval ◽

Open Domain ◽

Retrieval Method ◽

Passage Retrieval ◽

Question Analysis ◽

The Core ◽

Question Answering Systems ◽

Retrieval Phase

Abstract In this paper, we present our approach to improve the performance of open-domain Arabic Question Answering systems. We focus on the passage retrieval phase which aims to retrieve the most related passages to the correct answer. To extract passages that are related to the question, the system passes through three phases: Question Analysis, Document Retrieval and Passage Retrieval. We define the passage as the sentence that ends with a dot ".". In the Question Processing phase, we applied the traditional NLP steps of tokenization, stopwords and unrelated symbols removal, and replacing the question words with their stems. We also applied Query Expansion by adding synonyms to the question words. In the Document Retrieval phase, we used the Vector Space Model (VSM) with TF-IDF vectorizer and cosine similarity. For the Passage Retrieval phase, which is the core of our system, we measured the similarity between passages and the question by a combination of the BM25 ranker and Word Embedding approach. We tested our system on ACRD dataset, which contains 1395 questions in different domains, and the system was able to achieve correct results with a precision of 92.2% and recall of 79.9% in finding the top-3 related passages for the query.

Download Full-text

RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

10.18653/v1/2021.naacl-main.466 ◽

2021 ◽

Author(s):

Yingqi Qu ◽

Yuchen Ding ◽

Jing Liu ◽

Kai Liu ◽

Ruiyang Ren ◽

...

Keyword(s):

Question Answering ◽

Open Domain ◽

Passage Retrieval ◽

Training Approach

Download Full-text

Dense Passage Retrieval for Open-Domain Question Answering

10.18653/v1/2020.emnlp-main.550 ◽

2020 ◽

Cited By ~ 2

Author(s):

Vladimir Karpukhin ◽

Barlas Oguz ◽

Sewon Min ◽

Patrick Lewis ◽

Ledell Wu ◽

...

Keyword(s):

Question Answering ◽

Open Domain ◽

Passage Retrieval

Download Full-text

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

10.18653/v1/2021.eacl-main.74 ◽

2021 ◽

Author(s):

Gautier Izacard ◽

Edouard Grave

Keyword(s):

Question Answering ◽

Generative Models ◽

Open Domain ◽

Passage Retrieval

Download Full-text

SWFQA Semantic Web Based Framework for Question Answering

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2019010106 ◽

2019 ◽

Vol 9 (1) ◽

pp. 88-106

Author(s):

Irphan Ali ◽

Divakar Yadav ◽

Ashok Kumar Sharma

Keyword(s):

Semantic Web ◽

Natural Language ◽

Knowledge Base ◽

Language Processing ◽

Question Answering ◽

Knowledge Bases ◽

Digital Information ◽

Web Based ◽

Question Answering System ◽

User Query

A question answering system aims to provide the correct and quick answer to users' query from a knowledge base. Due to the growth of digital information on the web, information retrieval system is the need of the day. Most recent question answering systems consult knowledge bases to answer a question, after parsing and transforming natural language queries to knowledge base-executable forms. In this article, the authors propose a semantic web-based approach for question answering system that uses natural language processing for analysis and understanding the user query. It employs a “Total Answer Relevance Score” to find the relevance of each answer returned by the system. The results obtained thereof are quite promising. The real-time performance of the system has been evaluated on the answers, extracted from the knowledge base.

Download Full-text

Events Automatic Extraction from Arabic Texts

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2016010103 ◽

2016 ◽

Vol 6 (1) ◽

pp. 36-51 ◽

Cited By ~ 2

Author(s):

Emna Hkiri ◽

Souheyl Mallat ◽

Mounir Zrigui

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

Question Answering ◽

Arabic Language ◽

Event Extraction ◽

Mining Machine ◽

Automatic Extraction ◽

Open Domain ◽

Ongoing Effort

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.

Download Full-text

What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams

10.20944/preprints202105.0498.v1 ◽

2021 ◽

Author(s):

Di Jin ◽

Eileen Pan ◽

Nassim Oufattole ◽

Wei-Hung Weng ◽

Hanyi Fang ◽

...

Keyword(s):

Language Processing ◽

Large Scale ◽

Question Answering ◽

Free Form ◽

Test Accuracy ◽

Open Domain ◽

Medical Problems ◽

Rule Based ◽

Medical Board ◽

Simplified Chinese

Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.

Download Full-text

What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams

Applied Sciences ◽

10.3390/app11146421 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6421

Author(s):

Di Jin ◽

Eileen Pan ◽

Nassim Oufattole ◽

Wei-Hung Weng ◽

Hanyi Fang ◽

...

Keyword(s):

Language Processing ◽

Large Scale ◽

Question Answering ◽

Free Form ◽

Test Accuracy ◽

Open Domain ◽

Medical Problems ◽

Rule Based ◽

Medical Board ◽

Simplified Chinese

Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.

Download Full-text

On the semantics of noun compounds

Natural Language Engineering ◽

10.1017/s1351324913000090 ◽

2013 ◽

Vol 19 (3) ◽

pp. 289-290 ◽

Cited By ~ 3

Author(s):

STAN SZPAKOWICZ ◽

FRANCIS BOND ◽

PRESLAV NAKOV ◽

SU NAM KIM

Keyword(s):

Computational Linguistics ◽

Language Processing ◽

Question Answering ◽

Processing System ◽

Text Summarization ◽

Special Issue ◽

Compound A ◽

Research Challenges ◽

The Subject ◽

Noun Compounds

The noun compound – a sequence of nouns which functions as a single noun – is very common in English texts. No language processing system should ignore expressions like steel soup pot cover if it wants to be serious about such high-end applications of computational linguistics as question answering, information extraction, text summarization, machine translation – the list goes on. Processing noun compounds, however, is far from trouble-free. For one thing, they can be bracketed in various ways: is it steel soup, steel pot, or steel cover? Then there are relations inside a compound, annoyingly not signalled by any words: does potcontainsoup or is it for cookingsoup? These and many other research challenges are the subject of this special issue.

Download Full-text