Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources

Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Event extraction in other languages was not studied due to the lack of resources and algorithms necessary for natural language processing. In this paper we define a set of linguistic resources that are necessary in development of a knowledge-based event extraction system in Russian: a vocabulary of subordination models, a vocabulary of event triggers, and a vocabulary of Frame Elements that are basic building blocks for semantic patterns. We propose a set of methods for creation of such vocabularies in Russian and other languages using Google Books NGram Corpus. The methods are evaluated in development of event extraction system for Russian.

Download Full-text

Event Extraction and Representation: A Case Study for the Portuguese Language

Information ◽

10.3390/info10060205 ◽

2019 ◽

Vol 10 (6) ◽

pp. 205 ◽

Cited By ~ 1

Author(s):

Paulo Quaresma ◽

Vítor Beires Nogueira ◽

Kashyap Raiyani ◽

Roy Bayot

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Event Extraction ◽

Semantic Role Labeling ◽

Knowledge Based ◽

Part Of Speech ◽

Relevant Role ◽

Text Information ◽

Future Work

Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.

Download Full-text

PERLEX: A Bilingual Persian-English Gold Dataset for Relation Extraction

Scientific Programming ◽

10.1155/2021/8893270 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Majid Asgari-Bidhendi ◽

Mehrdad Nasser ◽

Behrooz Janfada ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Language Processing ◽

Question Answering ◽

State Of The Art ◽

Relation Extraction ◽

Neural Model ◽

Semantic Relations ◽

Base Population ◽

Neural Models ◽

Persian Language ◽

Knowledge Base Population

Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, question answering, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big data in the Persian language for different applications. In this paper, we present “PERLEX” as the first Persian dataset for relation extraction, which is an expert-translated version of the “SemEval-2010-Task-8” dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual BERT contextual word representations. The experiments result in the maximum F1-score of 77.66% (provided by BERTEM-MTB method) as the state of the art of relation extraction in the Persian language.

Download Full-text

EVENT EXTRACTION WITH COMPLEX EVENT CLASSIFICATION USING RICH FEATURES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720010004586 ◽

2010 ◽

Vol 08 (01) ◽

pp. 131-146 ◽

Cited By ~ 97

Author(s):

MAKOTO MIWA ◽

RUNE SÆTRE ◽

JIN-DONG KIM ◽

JUN'ICHI TSUJII

Keyword(s):

Language Processing ◽

High Performance ◽

Classification Problem ◽

Event Extraction ◽

Extraction System ◽

Binary Relations ◽

Shared Task ◽

Large Numbers ◽

Biomedical Event Extraction ◽

Complex Events

Biomedical Natural Language Processing (BioNLP) attempts to capture biomedical phenomena from texts by extracting relations between biomedical entities (i.e. proteins and genes). Traditionally, only binary relations have been extracted from large numbers of published papers. Recently, more complex relations (biomolecular events) have also been extracted. Such events may include several entities or other relations. To evaluate the performance of the text mining systems, several shared task challenges have been arranged for the BioNLP community. With a common and consistent task setting, the BioNLP'09 shared task evaluated complex biomolecular events such as binding and regulation.Finding these events automatically is important in order to improve biomedical event extraction systems. In the present paper, we propose an automatic event extraction system, which contains a model for complex events, by solving a classification problem with rich features. The main contributions of the present paper are: (1) the proposal of an effective bio-event detection method using machine learning, (2) provision of a high-performance event extraction system, and (3) the execution of a quantitative error analysis. The proposed complex (binding and regulation) event detector outperforms the best system from the BioNLP'09 shared task challenge.

Download Full-text

Building natural language processing tools for Runyakitara

Applied Linguistics Review ◽

10.1515/applirev-2020-2004 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Fridah Katushemererwe ◽

Andrew Caines ◽

Paula Buttery

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing ◽

Primary Data ◽

Computer Assisted ◽

Endangered Languages ◽

Test Case ◽

Short Supply ◽

Linguistic Resources

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.

Download Full-text

A Survey of Arabic Named Entity Recognition and Classification

Computational Linguistics ◽

10.1162/coli_a_00178 ◽

2014 ◽

Vol 40 (2) ◽

pp. 469-510 ◽

Cited By ~ 62

Author(s):

Khaled Shaalan

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Relevant Information ◽

Arabic Language ◽

Entity Recognition ◽

Named Entities ◽

Linguistic Resources ◽

Named Entity ◽

To Receive ◽

Made In

As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.

Download Full-text

Evolution of Semantic Similarity—A Survey

ACM Computing Surveys ◽

10.1145/3440755 ◽

2021 ◽

Vol 54 (2) ◽

pp. 1-37

Author(s):

Dhivya Chandrasekaran ◽

Vijay Mago

Keyword(s):

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Hybrid Methods ◽

Research Work ◽

Similarity Measures ◽

Text Data ◽

Knowledge Based ◽

Open Research ◽

Research Problems

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

Download Full-text

Knowledge-Based System supported by Chatbot to assist the Risk Classification Process of Patients in Hospital Emergency care (Preprint)

10.2196/preprints.33665 ◽

2021 ◽

Author(s):

Marciane Mueller ◽

Rejane Frozza ◽

Liane Mählmann Kipper ◽

Ana Carolina Kessler

Keyword(s):

Medical Care ◽

Knowledge Base ◽

Language Processing ◽

Emergency Care ◽

Risk Classification ◽

Conversational Agent ◽

Conversational Agents ◽

Hospital Emergency ◽

Knowledge Based System ◽

Knowledge Based

BACKGROUND This article presents the modeling and development of a Knowledge Based System, supported by the use of a virtual conversational agent called Dóris. Using natural language processing resources, Dóris collects the clinical data of patients in care in the context of urgency and hospital emergency. OBJECTIVE The main objective is to validate the use of virtual conversational agents to properly and accurately collect the data necessary to perform the evaluation flowcharts used to classify the degree of urgency of patients and determine the priority for medical care. METHODS The agent's knowledge base was modeled using the rules provided for in the evaluation flowcharts comprised by the Manchester Triage System. It also allows the establishment of a simple, objective and complete communication, through dialogues to assess signs and symptoms that obey the criteria established by a standardized, validated and internationally recognized system. RESULTS Thus, in addition to verifying the applicability of Artificial Intelligence techniques in a complex domain of health care, a tool is presented that helps not only in the perspective of improving organizational processes, but also in improving human relationships, bringing professionals and patients closer. The system's knowledge base was modeled on the IBM Watson platform. CONCLUSIONS The results obtained from simulations carried out by the human specialist allowed us to verify that a knowledge-based system supported by a virtual conversational agent is feasible for the domain of risk classification and priority determination of medical care for patients in the context of emergency care and hospital emergency.

Download Full-text

Knowledge base population from external data sources

10.14711/thesis-991012980225303412 ◽

2021 ◽

Author(s):

Xueling Lin

Keyword(s):

Knowledge Base ◽

Data Sources ◽

Base Population ◽

External Data ◽

Knowledge Base Population

Download Full-text

Enhancing Scientific Collaboration Through Knowledge Base Population and Linking for Meetings

Proceedings of the 51st Hawaii International Conference on System Sciences ◽

10.24251/hicss.2018.076 ◽

2018 ◽

Author(s):

Ning Gao ◽

Mark Dredze ◽

Douglas Oard

Keyword(s):

Knowledge Base ◽

Scientific Collaboration ◽

Base Population ◽

Knowledge Base Population

Download Full-text

Design of Link Evaluation Method to Improve Reliability based on Linked Open Big Data and Natural Language Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.33.18601 ◽

2018 ◽

Vol 7 (3.33) ◽

pp. 168

Author(s):

Yonglak SHON ◽

Jaeyoung PARK ◽

Jangmook KANG ◽

Sangwon LEE

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Language Processing ◽

Confidence Level ◽

Linked Data ◽

Evaluation Method ◽

Data Sets ◽

Knowledge Based ◽

Global Data ◽

Improve Reliability

The LOD data sets consist of RDF Triples based on the Ontology, a specification of existing facts, and by linking them to previously disclosed knowledge based on linked data principles. These structured LOD clouds form a large global data network, which provides a more accurate foundation for users to deliver the desired information. However, it is difficult to identify that, if the presence of the same object is identified differently across several LOD data sets, they are inherently identical. This is because objects with different URIs in the LOD datasets must be different and they must be closely examined for similarities in order to judge them as identical. The aim of this study is that the prosed model, RILE, evaluates similarity by comparing object values of existing specified predicates. After performing experiments with our model, we could check the improvement of the confidence level of the connection by extracting the link value.

Download Full-text