A semi-automatic approach to construct Vietnamese ontology from online text

An ontology is an effective formal representation of knowledge used commonly in artificial intelligence, semantic web, software engineering, and information retrieval. In open and distance learning, ontologies are used as knowledge bases for e-learning supplements, educational recommenders, and question answering systems that support students with much needed resources. In such systems, ontology construction is one of the most important phases. Since there are abundant documents on the Internet, useful learning materials can be acquired openly with the use of an ontology. However, due to the lack of system support for ontology construction, it is difficult to construct self-instructional materials for Vietnamese people. In general, the cost of manual acquisition of ontologies from domain documents and expert knowledge is too high. Therefore, we present a support system for Vietnamese ontology construction using pattern-based mechanisms to discover Vietnamese concepts and conceptual relations from Vietnamese text documents. In this system, we use the combination of statistics-based, data mining, and Vietnamese natural language processing methods to develop concept and conceptual relation extraction algorithms to discover knowledge from Vietnamese text documents. From the experiments, we show that our approach provides a feasible solution to build Vietnamese ontologies used for supporting systems in education.<br /><br />

Download Full-text

BUILDING QUESTION ANSWERING SYSTEM BASED ON COMPUTING DOMAIN ONTOLOGY

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v38i02.294 ◽

2020 ◽

Vol 38 (02) ◽

Author(s):

TẠ DUY CÔNG CHIẾN

Keyword(s):

Language Processing ◽

Digital Libraries ◽

Question Answering ◽

Domain Ontology ◽

Text Documents ◽

Question Answering System ◽

Domain Specific ◽

Sql Database ◽

Question Answering Systems ◽

Education Business

Question answering systems are applied to many different fields in recent years, such as education, business, and surveys. The purpose of these systems is to answer automatically the questions or queries of users about some problems. This paper introduces a question answering system is built based on a domain specific ontology. This ontology, which contains the data and the vocabularies related to the computing domain are built from text documents of the ACM Digital Libraries. Consequently, the system only answers the problems pertaining to the information technology domains such as database, network, machine learning, etc. We use the methodologies of Natural Language Processing and domain ontology to build this system. In order to increase performance, I use a graph database to store the computing ontology and apply no-SQL database for querying data of computing ontology.

Download Full-text

Building Graph for Events and Time in Natural Language Text

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8419.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 581-586

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Question Answering ◽

Relation Extraction ◽

Event Extraction ◽

Event Time ◽

Time Graph ◽

Question Answering Systems

Events and time are two major key terms in natural language processing due to the various event-oriented tasks these are become an essential terms in information extraction. In natural language processing and information extraction or retrieval event and time leads to several applications like text summaries, documents summaries, and question answering systems. In this paper, we present events-time graph as a new way of construction for event-time based information from text. In this event-time graph nodes are events, whereas edges represent the temporal and co-reference relations between events. In many of the previous researches of natural language processing mainly individually focused on extraction tasks and in domain-specific way but in this work we present extraction and representation of the relationship between events- time by representing with event time graph construction. Our overall system construction is in three-step process that performs event extraction, time extraction, and representing relation extraction. Each step is at a performance level comparable with the state of the art. We present Event extraction on MUC data corpus annotated with events mentions on which we train and evaluate our model. Next, we present time extraction the model of times tested for several news articles from Wikipedia corpus. Next is to represent event time relation by representation by next constructing event time graphs. Finally, we evaluate the overall quality of event graphs with the evaluation metrics and conclude the observations of the entire work

Download Full-text

Software architecture of the question-answering subsystem with elements of self-learning

Artificial Intelligence ◽

10.15407/jai2021.02.088 ◽

2021 ◽

Vol 26 (jai2021.26(2)) ◽

pp. 88-95

Author(s):

Hlybovets A ◽

◽

Tsaruk A ◽

Keyword(s):

Natural Language ◽

Language Processing ◽

Speech Synthesis ◽

Question Answering ◽

Building Blocks ◽

Knowledge Bases ◽

Learning Technologies ◽

Software Systems ◽

Question Answering Systems ◽

Self Learning

Within the framework of this paper, the analysis of software systems of question-answering type and their basic architectures has been carried out. With the development of machine learning technologies, creation of natural language processing (NLP) engines, as well as the rising popularity of virtual personal assistant programs that use the capabilities of speech synthesis (text-to-speech), there is a growing need in developing question-answering systems which can provide personalized answers to users' questions. All modern cloud providers proposed frameworks for organization of question answering systems but still we have a problem with personalized dialogs. Personalization is very important, it can put forward additional demands to a question-answering system’s capabilities to take this information into account while processing users’ questions. Traditionally, a question-answering system (QAS) is developed in the form of an application that contains a knowledge base and a user interface, which provides a user with answers to questions, and a means of interaction with an expert. In this article we analyze modern approaches to architecture development and try to build system from the building blocks that already exist on the market. Main criteria for the NLP modules were: support of the Ukrainian language, natural language understanding, functions of automatic definition of entities (attributes), ability to construct a dialogue flow, quality and completeness of documentation, API capabilities and integration with external systems, possibilities of external knowledge bases integration After provided analyses article propose the detailed architecture of the question-answering subsystem with elements of self-learning in the Ukrainian language. In the work you can find detailed description of main semantic components of the system (architecture components)

Download Full-text

BUILD KNOWLEDGE GRAPH FROM HETEROGENEOUS DOCUMENTS

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v47i05.761 ◽

2021 ◽

Vol 47 (05) ◽

Author(s):

NGUYỄN CHÍ HIẾU

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Knowledge Graph ◽

Question Answering Systems ◽

Knowledge Graphs

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents. We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain

Download Full-text

Document Summarization Using Sentence-Level Semantic Based on Word Embeddings

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194019500086 ◽

2019 ◽

Vol 29 (02) ◽

pp. 177-196 ◽

Cited By ~ 1

Author(s):

Kamal Al-Sabahi ◽

Zhang Zuping

Keyword(s):

Language Processing ◽

Web Search ◽

Question Answering ◽

Information Overload ◽

Good Representation ◽

Intelligence Analysis ◽

Word Embeddings ◽

Question Answering Systems ◽

Active Research ◽

News Recommendation

In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.

Download Full-text

An Empirical Study of Content Understanding in Conversational Question Answering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6257 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7578-7585

Author(s):

Ting-Rui Chiang ◽

Hao-Tong Ye ◽

Yun-Nung Chen

Keyword(s):

Natural Language Processing ◽

Empirical Study ◽

Language Processing ◽

Question Answering ◽

Source Code ◽

Content Understanding ◽

Question Answering Systems ◽

Benchmark Datasets ◽

Context Free ◽

Answering Questions

With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and CoQA, there has been more work on conversational question answering, and recent work has achieved competitive performance on both datasets. However, to best of our knowledge, two important questions for conversational comprehension research have not been well studied: 1) How well can the benchmark dataset reflect models' content understanding? 2) Do the models well utilize the conversation content when answering questions? To investigate these questions, we design different training settings, testing settings, as well as an attack to verify the models' capability of content understanding on QuAC and CoQA. The experimental results indicate some potential hazards in the benchmark datasets, QuAC and CoQA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models. With deep investigation of the task, it is believed that this work can benefit the future progress of conversation comprehension. The source code is available at https://github.com/MiuLab/CQA-Study.

Download Full-text

Super Agent Chatbot “3S” Sebagai Media Informasi Menggunakan Metoda Natural Language Processing(NLP)

JURNAL TEKNOLOGI DAN OPEN SOURCE ◽

10.36378/jtos.v2i1.144 ◽

2019 ◽

Vol 2 (1) ◽

pp. 53-64

Author(s):

Herwin H Herwin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Site ◽

Question Answering ◽

Question Answering Systems ◽

Portal Website

STMIK Amik Riau memiliki portal pada website http://www.sar.ac.id difungsikan sebagai media penyebaran informasi bagi sivitas akademika dan stakeholder. Rerata pengunjung setiap hari dalam 3 bulan terakhir adalah 150 kunjungan, namun terjadi peningkatan pada saat penerimaan mahasiswa di setiap tahun akademik. Hal ini mengindikasikan terjadinya peningkatan minat masyarakat untuk mengetahui informasi STMIK Amik Riau. Sayangnya, sampai saat ini pemanfaatan portal web site masih satu arah, dari STMIK Amik Riau ke stakeholder dan masyarakat, tidak terjadi sebaliknya. Komunikasi stakeholder dengan PT sehubungan dengan muatan yang ada di dalam portal menggunakan media sosial dan tidak terintegrasi dengan web. Begitu juga dengan masukan, koreksi, tanggapan, maupun komunikasi lain menggunakan media sosial. Sampai saat ini, masyarakat yang mengunjungi portal website baik masyarakat luas, maupun stakeholder tidak dapat dideteksi waktu berkunjung sehingga tidak dapat disapa dengan filosofi “3S”, padahal masyarakat luas yang telah berkunjung merupakan pasar potensial untuk di edukasi. Masyarakat yang berkunjung ke portal website, dengan sopan di sapa oleh sistem, kemudian dilanjutkan dengan komunikasi langsung, tersedia mesin yang siap memberikan salam dan melayani setiap pertanyaan yang diajukan oleh pengunjung. Penelitian ini bertujuan membuat chatbot yang mampu berkomunikasi dengan pengunjung website. Chatbot yang telah dibuat diberi nama STMIK Amik Riau Intelligence Virtual Information disingkat SILVI. Chatbot dibuat berdasarkan Question Answering Systems (QAS), bekerja dengan algoritma kemiripan antara dua teks. Penelitian ini menghasilkan aplikasi yang siap digunakan, diberi nama SILVI, mampu berkomunikasi dengan pengunjung website. Chatbot mengoptimalkan komunikasi seolah tidak menyadari, tetap menganggap lawan bicara adalah pegawai yang tepat dalam tugas pokok dan fungsi.

Download Full-text

Towards relation extraction from Arabic text: a review

International Robotics & Automation Journal ◽

10.15406/iratj.2019.05.00195 ◽

2019 ◽

Vol 5 (5) ◽

pp. 212-215

Author(s):

Abeer AlArfaj

Keyword(s):

Language Processing ◽

Question Answering ◽

State Of The Art ◽

Relation Extraction ◽

Arabic Language ◽

Extraction Methods ◽

The State ◽

Semantic Relations ◽

Arabic Text ◽

Taxonomic Relation

Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. The Arabic language has complex morphological, grammatical, and semantic aspects since it is a highly inflectional and derivational language, which makes task even more challenging. In this paper, we present a review of the state of the art for relation extraction from texts, addressing the progress and difficulties in this field. We discuss several aspects related to this task, considering the taxonomic and non-taxonomic relation extraction methods. Majority of relation extraction approaches implement a combination of statistical and linguistic techniques to extract semantic relations from text. We also give special attention to the state of the work on relation extraction from Arabic texts, which need further progress.

Download Full-text

Data Fusion in Question Answering Systems over Multiple-Knowledge Bases

10.2991/asum.k.210827.031 ◽

2021 ◽

Author(s):

Nhuan D. To ◽

Marek Z. Reformat

Keyword(s):

Data Fusion ◽

Question Answering ◽

Knowledge Bases ◽

Question Answering Systems

Download Full-text

Introducing External Knowledge to Answer Questions with Implicit Temporal Constraints over Knowledge Base

Future Internet ◽

10.3390/fi12030045 ◽

2020 ◽

Vol 12 (3) ◽

pp. 45

Author(s):

Wenqing Wu ◽

Zhenfang Zhu ◽

Qiang Lu ◽

Dianyuan Zhang ◽

Qiangqiang Guo

Keyword(s):

Natural Language ◽

Knowledge Base ◽

Question Answering ◽

Knowledge Bases ◽

Temporal Information ◽

Temporal Constraints ◽

External Knowledge ◽

Question Answering Systems ◽

Natural Language Question ◽

Applied Knowledge

Knowledge base question answering (KBQA) aims to analyze the semantics of natural language questions and return accurate answers from the knowledge base (KB). More and more studies have applied knowledge bases to question answering systems, and when using a KB to answer a natural language question, there are some words that imply the tense (e.g., original and previous) and play a limiting role in questions. However, most existing methods for KBQA cannot model a question with implicit temporal constraints. In this work, we propose a model based on a bidirectional attentive memory network, which obtains the temporal information in the question through attention mechanisms and external knowledge. Specifically, we encode the external knowledge as vectors, and use additive attention between the question and external knowledge to obtain the temporal information, then further enhance the question vector to increase the accuracy. On the WebQuestions benchmark, our method not only performs better with the overall data, but also has excellent performance regarding questions with implicit temporal constraints, which are separate from the overall data. As we use attention mechanisms, our method also offers better interpretability.

Download Full-text