scholarly journals QUESTION ANSWERING SYSTEM INFORMASI PARIWISATA KOTA PALEMBANG

2019 ◽  
Vol 21 (2) ◽  
pp. 128-138
Author(s):  
Marga Lenni ◽  
R. Kristoforus Jawa Bendi

The development of information technology is very rapid, resulting in an overflow of data. The amount of data can be used to obtain information needed by the user. The problem is, not all information can be found easily, especially very specific information. Likewise information about tourism. One way to overcome these problems is to utilize Natural Language Processing Technology, especially Question Answering System, which allows Computers to understand the meaning of Questions posed by users in natural languages. This study built a simple Question Answering System application. Application developed with PHP programming language, and MySql database. Preprocessing techniques used are Tokenization, Part-Of-Speech tagging, and Named Entity Recognation. The test result show that the application is able to provide answers to user questions of 82,05%.

2020 ◽  
Vol 29 (06) ◽  
pp. 2050019
Author(s):  
Hadi Veisi ◽  
Hamed Fakour Shandi

A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.


Since early days Question Answering (QA) has been an intuitive way of understanding the concept by humans. Considering its inevitable importance it has been introduced to children from very early age and they are promoted to ask more and more questions. With the progress in Machine Learning & Ontological semantics, Natural Language Question Answering (NLQA) has gained more popularity in recent years. In this paper QUASE (QUestion Answering System for Education) question answering system for answering natural language questions has been proposed which help to find answer for any given question in a closed domain containing finite set of documents. Th e QA s y st em m a inl y focuses on factoid questions. QUASE has used Question Taxonomy for Question Classification. Several Natural Language Processing techniques like Part of Speech (POS) tagging, Lemmatization, Sentence Tokenization have been applied for document processing to make search better and faster. DBPedia ontology has been used to validate the candidate answers. By application of this system the learners can gain knowledge on their own by getting precise answers to their questions asked in natural language instead of getting back merely a list of documents. The precision, recall and F measure metrics have been taken into account to evaluate the performance of answer type evaluation. The metric Mean Reciprocal Rank has been considered to evaluate the performance of QA system. Our experiment has shown significant improvement in classifying the questions in to correct answer types over other methods with approximately 91% accuracy and also providing better performance as a QA system in closed domain search.


Author(s):  
Rufai Yusuf Zakari ◽  
Zaharaddeen Karami Lawal ◽  
Idris Abdulmumin

The processing of natural languages is an area of computer science that has gained growing attention recently. NLP helps computers recognize, in other words, the ways in which people use their language. NLP research, however, has been performed predominantly on languages with abundant quantities of annotated data, such as English, French, German and Arabic. While the Hausa Language is Africa's second most commonly used language, only a few studies have so far focused on Hausa Natural Language Processing (HNLP). In this research paper, using a keyword index and article title search, we present a systematic analysis of the current literature applicable to HNLP in the Google Scholar database from 2015 to June 2020. A very few research papers on HNLP research, especially in areas such as part-of-speech tagging (POS), Name Entity Recognition (NER), Words Embedding, Speech Recognition and Machine Translation, have just recently been released. This is due to the fact that for training intelligent models, NLP depends on a huge amount of human-annotated data. HNLP is now attracting researchers' attention after extensive research on NLP in English and other languages has been performed. The key objectives of this paper are to promote research, to define likely areas for future studies in the HNLP, and to assist in the creation of further examinations by researchers for relevant studies.


Author(s):  
Mwnthai Narzary ◽  
Gwmsrang Muchahary ◽  
Maharaj Brahma ◽  
Sanjib Narzary ◽  
Pranav Kumar Singh ◽  
...  

With over 1.4 million Bodo speakers, there is a need for Automated Language Processing systems such as Machine translation, Part Of Speech tagging, Speech recognition, Named Entity Recognition, and so on. In order to develop such a system it requires a sufficient amount of dataset. In this paper we present a detailed description of the primary resources available for Bodo language that can be used as datasets to study Natural Language Processing and its applications. We have listed out different resources available for Bodo language: 8,005 Lexicon dataset collected from agriculture and health, Raw corpus dataset of 2,915,544 words, Tagged corpus consisting of 30,000 sentences, Parallel corpus of 28,359 sentences from tourism, agriculture and health and Tagged and Parallel corpus dataset of 37,768 sentences. We further discuss the challenges and opportunities present in Bodo language.


Author(s):  
Yuan Zhang ◽  
Hongshen Chen ◽  
Yihong Zhao ◽  
Qun Liu ◽  
Dawei Yin

Sequence tagging is the basis for multiple applications in natural language processing. Despite successes in learning long term token sequence dependencies with neural network, tag dependencies are rarely considered previously. Sequence tagging actually possesses complex dependencies and interactions among the input tokens and the output tags. We propose a novel multi-channel model, which handles different ranges of token-tag dependencies and their interactions simultaneously. A tag LSTM is augmented to manage the output tag dependencies and word-tag interactions, while three mechanisms are presented to efficiently incorporate token context representation and tag dependency. Extensive experiments on part-of-speech tagging and named entity recognition tasks show that  the proposed model outperforms the BiLSTM-CRF baseline by effectively incorporating the tag dependency feature.


2021 ◽  
Vol 72 ◽  
pp. 1385-1470
Author(s):  
Alexandra N. Uma ◽  
Tommaso Fornaciari ◽  
Dirk Hovy ◽  
Silviu Paun ◽  
Barbara Plank ◽  
...  

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on NLP and CV tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials.


Author(s):  
Minlong Peng ◽  
Qi Zhang ◽  
Xiaoyu Xing ◽  
Tao Gui ◽  
Jinlan Fu ◽  
...  

Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.


Author(s):  
Dan Tufiș ◽  
Radu Ion

One of the fundamental tasks in natural-language processing is the morpho-lexical disambiguation of words occurring in text. Over the last twenty years or so, approaches to part-of-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Due to recent increases in computing power, together with improvements in tagging technology and the extension of language typologies, part-of-speech tags have become significantly more complex. The need to address multilinguality more directly in the web environment has created a demand for interoperable, harmonized morpho-lexical descriptions across languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard statistical tagging, yet ensure that full lexicon information is available for each word form in the output. The chapter overviews the current major approaches to part-of-speech tagging.


Sign in / Sign up

Export Citation Format

Share Document