scholarly journals Deep Learning Based Question Answering Search Engine

Author(s):  
Mrunal Malekar

Domain based Question Answering is concerned with building systems which provide answers to natural language questions that are asked specific to a domain. It comes under Information Retrieval and Natural language processing. Using Information Retrieval, one can search for the relevant documents which may contain the answer but it won’t give the exact answer for the question asked. In the presented work, a question answering search engine has been developed which first finds out the relevant documents from a huge textual document data of a construction company and then goes a step beyond to extract answer from the extracted document. The robust question answering system developed uses Elastic Search for Information Retrieval [paragraphs extraction] and Deep Learning for answering the question from the short extracted paragraph. It leverages BERT Deep Learning Model to understand the layers and representations between the question and answer. The research work also focuses on how to improve the search accuracy of the Information Retrieval based Elastic Search engine which returns the relevant documents which may contain the answer.

2021 ◽  
Vol 47 (05) ◽  
Author(s):  
NGUYỄN CHÍ HIẾU

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents.  We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain


Author(s):  
Saravanakumar Kandasamy ◽  
Aswani Kumar Cherukuri

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.


Author(s):  
Sunita Warjri ◽  
Partha Pakray ◽  
Saralin A. Lyngdoh ◽  
Arnab Kumar Maji

Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a good performance of the tagger. Our main contribution in this research work is the designed Khasi POS corpus. Till date, there has been no form of any kind of Khasi corpus developed or formally developed. In the present designed Khasi POS corpus, each word is tagged manually using the designed tagset. Methods of deep learning have been used to experiment with our designed Khasi POS corpus. The POS tagger based on BiLSTM, combinations of BiLSTM with CRF, and character-based embedding with BiLSTM are presented. The main challenges of understanding and handling Natural Language toward Computational linguistics to encounter are anticipated. In the presently designed corpus, we have tried to solve the problems of ambiguities of words concerning their context usage, and also the orthography problems that arise in the designed POS corpus. The designed Khasi corpus size is around 96,100 tokens and consists of 6,616 distinct words. Initially, while running the first few sets of data of around 41,000 tokens in our experiment the taggers are found to yield considerably accurate results. When the Khasi corpus size has been increased to 96,100 tokens, we see an increase in accuracy rate and the analyses are more pertinent. As results, accuracy of 96.81% is achieved for the BiLSTM method, 96.98% for BiLSTM with CRF technique, and 95.86% for character-based with LSTM. Concerning substantial research from the NLP perspectives for Khasi, we also present some of the recently existing POS taggers and other NLP works on the Khasi language for comparative purposes.


2021 ◽  
Author(s):  
Nathan Ji ◽  
Yu Sun

The digital age gives us access to a multitude of both information and mediums in which we can interpret information. A majority of the time, many people find interpreting such information difficult as the medium may not be as user friendly as possible. This project has examined the inquiry of how one can identify specific information in a given text based on a question. This inquiry is intended to streamline one's ability to determine the relevance of a given text relative to his objective. The project has an overall 80% success rate given 10 articles with three questions asked per article. This success rate indicates that this project is likely applicable to those who are asking for content level questions within an article.


2017 ◽  
Vol 11 (03) ◽  
pp. 345-371
Author(s):  
Avani Chandurkar ◽  
Ajay Bansal

With the inception of the World Wide Web, the amount of data present on the Internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. This paper presents a Question Answering system to ease the process of information retrieval. Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. The research objective of this paper is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval (IR) and Natural Language processing (NLP). The focus is on using a structured and annotated knowledge base instead of an unstructured one. The knowledge base used here is DBpedia and the final system is evaluated on the Text REtrieval Conference (TREC) 2004 questions dataset.


2022 ◽  
Vol 31 (1) ◽  
pp. 113-126
Author(s):  
Jia Guo

Abstract Emotional recognition has arisen as an essential field of study that can expose a variety of valuable inputs. Emotion can be articulated in several means that can be seen, like speech and facial expressions, written text, and gestures. Emotion recognition in a text document is fundamentally a content-based classification issue, including notions from natural language processing (NLP) and deep learning fields. Hence, in this study, deep learning assisted semantic text analysis (DLSTA) has been proposed for human emotion detection using big data. Emotion detection from textual sources can be done utilizing notions of Natural Language Processing. Word embeddings are extensively utilized for several NLP tasks, like machine translation, sentiment analysis, and question answering. NLP techniques improve the performance of learning-based methods by incorporating the semantic and syntactic features of the text. The numerical outcomes demonstrate that the suggested method achieves an expressively superior quality of human emotion detection rate of 97.22% and the classification accuracy rate of 98.02% with different state-of-the-art methods and can be enhanced by other emotional word embeddings.


Author(s):  
Jia Zeng ◽  
Christian X. Cruz-Pico ◽  
Turçin Saridogan ◽  
Md Abu Shufean ◽  
Michael Kahle ◽  
...  

PURPOSE Despite advances in molecular therapeutics, few anticancer agents achieve durable responses. Rational combinations using two or more anticancer drugs have the potential to achieve a synergistic effect and overcome drug resistance, enhancing antitumor efficacy. A publicly accessible biomedical literature search engine dedicated to this domain will facilitate knowledge discovery and reduce manual search and review. METHODS We developed RetriLite, an information retrieval and extraction framework that leverages natural language processing and domain-specific knowledgebase to computationally identify highly relevant papers and extract key information. The modular architecture enables RetriLite to benefit from synergizing information retrieval and natural language processing techniques while remaining flexible to customization. We customized the application and created an informatics pipeline that strategically identifies papers that describe efficacy of using combination therapies in clinical or preclinical studies. RESULTS In a small pilot study, RetriLite achieved an F 1 score of 0.93. A more extensive validation experiment was conducted to determine agents that have enhanced antitumor efficacy in vitro or in vivo with poly (ADP-ribose) polymerase inhibitors: 95.9% of the papers determined to be relevant by our application were true positive and the application's feature of distinguishing a clinical paper from a preclinical paper achieved an accuracy of 97.6%. Interobserver assessment was conducted, which resulted in a 100% concordance. The data derived from the informatics pipeline have also been made accessible to the public via a dedicated online search engine with an intuitive user interface. CONCLUSION RetriLite is a framework that can be applied to establish domain-specific information retrieval and extraction systems. The extensive and high-quality metadata tags along with keyword highlighting facilitate information seekers to more effectively and efficiently discover knowledge in the combination therapy domain.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 200
Author(s):  
Ammar Arbaaeen ◽  
Asadullah Shah

For many users of natural language processing (NLP), it can be challenging to obtain concise, accurate and precise answers to a question. Systems such as question answering (QA) enable users to ask questions and receive feedback in the form of quick answers to questions posed in natural language, rather than in the form of lists of documents delivered by search engines. This task is challenging and involves complex semantic annotation and knowledge representation. This study reviews the literature detailing ontology-based methods that semantically enhance QA for a closed domain, by presenting a literature review of the relevant studies published between 2000 and 2020. The review reports that 83 of the 124 papers considered acknowledge the QA approach, and recommend its development and evaluation using different methods. These methods are evaluated according to accuracy, precision, and recall. An ontological approach to semantically enhancing QA is found to be adopted in a limited way, as many of the studies reviewed concentrated instead on NLP and information retrieval (IR) processing. While the majority of the studies reviewed focus on open domains, this study investigates the closed domain.


2017 ◽  
Vol 58 (2) ◽  
pp. 1
Author(s):  
Waheeb Ahmed ◽  
Babu Anto

An automatic web based Question Answering (QA) system is a valuable tool for improving e-learning and education. Several approaches employ natural language processing technology to understand questions given in natural language text, which is incomplete and error-prone. In addition, instead of extracting exact answer, many approaches simply return hyperlinks to documents containing the answers, which is inconvenient for the students or learners. In this paper we develop technique to detect the type of a question, based on which the proper technique for extracting the answer is used. The system returns only blocks or phrases of data containing the answer rather than full documents. Therefore, we can highly improve the efficiency of Web QA systems for e-learning.


Natural languages are ambiguous and computers are not capable of understanding the natural languages in the way people really understand them. Natural Language Processing (NLP) is concerned with the development of computational models based on the aspects of human language processing. Question Answering (QA) system is a field of Natural Language Processing that provides precise answer for the user question which is given in natural language. In this work, a MemN2N model based question answering system is implemented and its performance is evaluated with a complex question answering tasks using bAbI dataset of three different language text corpuses. The scope of this work is to understand the language independent and dependant aspects of a deep learning network. For this, we will study the performance of the deep learning network by training and testing it with different kinds of question answering tasks with different languages and also try to understand the difference in performance with respect to the languages


Sign in / Sign up

Export Citation Format

Share Document