QUESTION ANSWERING SYSTEM INFORMASI PARIWISATA KOTA PALEMBANG

The development of information technology is very rapid, resulting in an overflow of data. The amount of data can be used to obtain information needed by the user. The problem is, not all information can be found easily, especially very specific information. Likewise information about tourism. One way to overcome these problems is to utilize Natural Language Processing Technology, especially Question Answering System, which allows Computers to understand the meaning of Questions posed by users in natural languages. This study built a simple Question Answering System application. Application developed with PHP programming language, and MySql database. Preprocessing techniques used are Tokenization, Part-Of-Speech tagging, and Named Entity Recognation. The test result show that the application is able to provide answers to user questions of 82,05%.

Download Full-text

A Persian Medical Question Answering System

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020500190 ◽

2020 ◽

Vol 29 (06) ◽

pp. 2050019

Author(s):

Hadi Veisi ◽

Hamed Fakour Shandi

Keyword(s):

Natural Language ◽

Language Processing ◽

Question Answering ◽

Document Retrieval ◽

Main Concept ◽

Question Answering System ◽

Detection Algorithms ◽

Persian Language ◽

Part Of Speech ◽

Answer Extraction

A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.

Download Full-text

QUASE: AN Ontology-Based Domain Specific Natural Language Question Answering System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d6773.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 261-268

Keyword(s):

Natural Language ◽

Language Processing ◽

Question Answering ◽

Closed Domain ◽

Question Answering System ◽

Pos Tagging ◽

Part Of Speech ◽

Natural Language Question ◽

Finite Set ◽

Language Question

Since early days Question Answering (QA) has been an intuitive way of understanding the concept by humans. Considering its inevitable importance it has been introduced to children from very early age and they are promoted to ask more and more questions. With the progress in Machine Learning & Ontological semantics, Natural Language Question Answering (NLQA) has gained more popularity in recent years. In this paper QUASE (QUestion Answering System for Education) question answering system for answering natural language questions has been proposed which help to find answer for any given question in a closed domain containing finite set of documents. Th e QA s y st em m a inl y focuses on factoid questions. QUASE has used Question Taxonomy for Question Classification. Several Natural Language Processing techniques like Part of Speech (POS) tagging, Lemmatization, Sentence Tokenization have been applied for document processing to make search better and faster. DBPedia ontology has been used to validate the candidate answers. By application of this system the learners can gain knowledge on their own by getting precise answers to their questions asked in natural language instead of getting back merely a list of documents. The precision, recall and F measure metrics have been taken into account to evaluate the performance of answer type evaluation. The metric Mean Reciprocal Rank has been considered to evaluate the performance of QA system. Our experiment has shown significant improvement in classifying the questions in to correct answer types over other methods with approximately 91% accuracy and also providing better performance as a QA system in closed domain search.

Download Full-text

A Systematic Literature Review of Hausa Natural Language Processing

International Journal of Computer and Information Technology(2279-0764) ◽

10.24203/ijcit.v10i4.86 ◽

2021 ◽

Vol 10 (4) ◽

Author(s):

Rufai Yusuf Zakari ◽

Zaharaddeen Karami Lawal ◽

Idris Abdulmumin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Name Entity Recognition ◽

Entity Recognition ◽

Natural Languages ◽

Systematic Analysis ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Intelligent Models

The processing of natural languages is an area of computer science that has gained growing attention recently. NLP helps computers recognize, in other words, the ways in which people use their language. NLP research, however, has been performed predominantly on languages with abundant quantities of annotated data, such as English, French, German and Arabic. While the Hausa Language is Africa's second most commonly used language, only a few studies have so far focused on Hausa Natural Language Processing (HNLP). In this research paper, using a keyword index and article title search, we present a systematic analysis of the current literature applicable to HNLP in the Google Scholar database from 2015 to June 2020. A very few research papers on HNLP research, especially in areas such as part-of-speech tagging (POS), Name Entity Recognition (NER), Words Embedding, Speech Recognition and Machine Translation, have just recently been released. This is due to the fact that for training intelligent models, NLP depends on a huge amount of human-annotated data. HNLP is now attracting researchers' attention after extensive research on NLP in English and other languages has been performed. The key objectives of this paper are to promote research, to define likely areas for future studies in the HNLP, and to assist in the creation of further examinations by researchers for relevant studies.

Download Full-text

Bodo Resources for NLP - An Overview of Existing Primary Resources for Bodo

Proceedings of Intelligent Computing and Technologies Conference ◽

10.21467/proceedings.115.12 ◽

2021 ◽

Author(s):

Mwnthai Narzary ◽

Gwmsrang Muchahary ◽

Maharaj Brahma ◽

Sanjib Narzary ◽

Pranav Kumar Singh ◽

...

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Parallel Corpus ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Challenges And Opportunities ◽

Speech Tagging

With over 1.4 million Bodo speakers, there is a need for Automated Language Processing systems such as Machine translation, Part Of Speech tagging, Speech recognition, Named Entity Recognition, and so on. In order to develop such a system it requires a sufficient amount of dataset. In this paper we present a detailed description of the primary resources available for Bodo language that can be used as datasets to study Natural Language Processing and its applications. We have listed out different resources available for Bodo language: 8,005 Lexicon dataset collected from agriculture and health, Raw corpus dataset of 2,915,544 words, Tagged corpus consisting of 30,000 sentences, Parallel corpus of 28,359 sentences from tourism, agriculture and health and Tagged and Parallel corpus dataset of 37,768 sentences. We further discuss the challenges and opportunities present in Bodo language.

Download Full-text

Learning Tag Dependencies for Sequence Tagging

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/637 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yuan Zhang ◽

Hongshen Chen ◽

Yihong Zhao ◽

Qun Liu ◽

Dawei Yin

Keyword(s):

Language Processing ◽

Channel Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Proposed Model ◽

Speech Tagging

Sequence tagging is the basis for multiple applications in natural language processing. Despite successes in learning long term token sequence dependencies with neural network, tag dependencies are rarely considered previously. Sequence tagging actually possesses complex dependencies and interactions among the input tokens and the output tags. We propose a novel multi-channel model, which handles different ranges of token-tag dependencies and their interactions simultaneously. A tag LSTM is augmented to manage the output tag dependencies and word-tag interactions, while three mechanisms are presented to efficiently incorporate token context representation and tag dependency. Extensive experiments on part-of-speech tagging and named entity recognition tasks show that the proposed model outperforms the BiLSTM-CRF baseline by effectively incorporating the tag dependency feature.

Download Full-text

Multilingual Named Entity Recognition Model for Indonesian Health Insurance Question Answering System

2020 3rd International Conference on Information and Communications Technology (ICOIACT) ◽

10.1109/icoiact50329.2020.9332027 ◽

2020 ◽

Author(s):

Budi Sulistiyo Jati ◽

ST Widyawan ◽

S.T. Muhammad Nur Rizal

Keyword(s):

Health Insurance ◽

Question Answering ◽

Named Entity Recognition ◽

Entity Recognition ◽

Question Answering System ◽

Recognition Model ◽

Named Entity

Download Full-text

Comparative Question Answering System based on Natural Language Processing and Machine Learning

2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) ◽

10.1109/icais50930.2021.9396015 ◽

2021 ◽

Author(s):

Rohit Arora ◽

Parth Singh ◽

Hemlata Goyal ◽

Sunita Singhal ◽

Smita Vijayvargiya

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Question Answering System

Download Full-text

Learning from Disagreement: A Survey

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12752 ◽

2021 ◽

Vol 72 ◽

pp. 1385-1470

Author(s):

Alexandra N. Uma ◽

Tommaso Fornaciari ◽

Dirk Hovy ◽

Silviu Paun ◽

Barbara Plank ◽

...

Keyword(s):

Language Processing ◽

Gold Standard ◽

Training Methods ◽

High Quality ◽

Training Models ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Growing Body ◽

Research Questions ◽

Speech Tagging

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on NLP and CV tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials.

Download Full-text

Learning Task-Specific Representation for Novel Words in Sequence Labeling

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/715 ◽

2019 ◽

Author(s):

Minlong Peng ◽

Qi Zhang ◽

Xiaoyu Xing ◽

Tao Gui ◽

Jinlan Fu ◽

...

Keyword(s):

Empirical Studies ◽

Named Entity Recognition ◽

Learning Task ◽

Training Data ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Sequence Labeling ◽

Part Of Speech ◽

Word Representation

Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.

Download Full-text

Part-of-Speech Tagging

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.51 ◽

2017 ◽

Author(s):

Dan Tufiș ◽

Radu Ion

Keyword(s):

Language Processing ◽

Machine Learning Techniques ◽

Computing Power ◽

Data Sparseness ◽

Part Of Speech Tagging ◽

Web Environment ◽

Part Of Speech ◽

Learning Techniques ◽

Lexical Disambiguation ◽

Speech Tagging

One of the fundamental tasks in natural-language processing is the morpho-lexical disambiguation of words occurring in text. Over the last twenty years or so, approaches to part-of-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Due to recent increases in computing power, together with improvements in tagging technology and the extension of language typologies, part-of-speech tags have become significantly more complex. The need to address multilinguality more directly in the web environment has created a demand for interoperable, harmonized morpho-lexical descriptions across languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard statistical tagging, yet ensure that full lexicon information is available for each word form in the output. The chapter overviews the current major approaches to part-of-speech tagging.

Download Full-text