Multi-language Information Extraction with Text Pattern Recognition

Information extraction is a task that can extract meta-data information from text. The research in this article proposes a new information extraction algorithm called GenerateIE. The proposed algorithm identifies pairs of entities and relations described in a piece of text. The extracted meta-data is useful in many areas, but within this research the focus is to use them in news-media contexts to provide the gist of the written articles for analytics and paraphrasing of news information. GenerateIE algorithm is compared with existing state of the art algorithms with two benefits. Firstly, the GenerateIE provides the co-referenced word as the entity instead of using he, she, it, etc. which is more beneficial for knowledge graphs. Secondly GenerateIE can be applied on multiple languages without changing the algorithm itself apart from the underlying natural language text-parsing. Furthermore, the performance of GenerateIE compared with state-of-the-art algorithms is not significantly better, but it offers competitive results.

Download Full-text

Relationship Classification based on Dependency Parsing and Pre-training Model

10.21203/rs.3.rs-1105737/v1 ◽

2021 ◽

Author(s):

Baosheng Yin ◽

Yifei Sun

Keyword(s):

Information Extraction ◽

Extraction Method ◽

Training Model ◽

Context Information ◽

Dependency Parsing ◽

Improve Performance ◽

Natural Language Text ◽

Relationship Extraction ◽

Entity Relationship ◽

Language Text

Abstract As an important part of information extraction, relationship extraction aims to extract the relationships between given entities from natural language text. On the basis of the pre-training model R-BERT, this paper proposes an entity relationship extraction method that integrates entity dependency path and pre-training model, which generates a dependency parse tree by dependency parsing, obtains the dependency path of entity pair via a given entity, and uses entity dependency path to exclude such information as modifier chunks and useless entities in sentences. This model has achieved good F1 value performance on the SemEval2010 Task 8 dataset. Experiments on dataset show that dependency parsing can provide context information for models and improve performance.

Download Full-text

3 A Region-based Approach to the Automated Marking of Short Textual Answers

Sir Syed Research Journal of Engineering & Technology ◽

10.33317/ssurj.v1i1.74 ◽

2011 ◽

Vol 1 (1) ◽

pp. 7 ◽

Cited By ~ 1

Author(s):

Raheel Siddiqi

Keyword(s):

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Worked Examples ◽

Natural Language Text ◽

Short Answer ◽

Extended Approach ◽

Original Question ◽

Definition Of ◽

Language Text

Automated marking of short textual answers is a challenging task due to the difficulties involved in accurately “understanding” natural language text. However, certain purpose-built Natural Language Processing (NLP) techniques can be used for this purpose. This paper describes an NLP-based approach to automated assessment that extends an earlier approach [1] to enable the automated marking of longer answers as well as answers that are partially correct. In the extended approach, the original Question Answer Language (QAL) is augmented to support the definition of regions of text that are expected to appear in a student’s answer. In order to explain the extensions to QAL, we present worked examples based on real exam questions. The system’s ability to accurately mark longer answer texts is shown to be on a par with that of existing state-of-the-art short-answer marking systems which are not capable of marking such longer texts.

Download Full-text

Edge information extraction algorithm for CT cerebrovascular

JOURNAL OF ELECTRONIC MEASUREMENT AND INSTRUMENT ◽

10.3724/sp.j.1187.2010.00346 ◽

2010 ◽

Vol 24 (4) ◽

pp. 346-352 ◽

Cited By ~ 1

Author(s):

Ran Qin

Keyword(s):

Information Extraction ◽

Edge Information ◽

Extraction Algorithm

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Sampo-UI: A full stack JavaScript framework for developing semantic portal user interfaces

Semantic Web ◽

10.3233/sw-210428 ◽

2021 ◽

pp. 1-16

Author(s):

Esko Ikkala ◽

Eero Hyvönen ◽

Heikki Rantala ◽

Mikko Koho

Keyword(s):

User Interfaces ◽

Linked Data ◽

State Of The Art ◽

Software Framework ◽

End User ◽

Faceted Search ◽

Software Developer ◽

Current State ◽

Knowledge Graphs ◽

User Friendly

This paper presents a new software framework, Sampo-UI, for developing user interfaces for semantic portals. The goal is to provide the end-user with multiple application perspectives to Linked Data knowledge graphs, and a two-step usage cycle based on faceted search combined with ready-to-use tooling for data analysis. For the software developer, the Sampo-UI framework makes it possible to create highly customizable, user-friendly, and responsive user interfaces using current state-of-the-art JavaScript libraries and data from SPARQL endpoints, while saving substantial coding effort. Sampo-UI is published on GitHub under the open MIT License and has been utilized in several internal and external projects. The framework has been used thus far in creating six published and five forth-coming portals, mostly related to the Cultural Heritage domain, that have had tens of thousands of end-users on the Web.

Download Full-text

Morality Classification in Natural Language Text

IEEE Transactions on Affective Computing ◽

10.1109/taffc.2020.3034050 ◽

2020 ◽

pp. 1-1

Author(s):

Matheus C. Pavan ◽

Vitor G. Santos ◽

Alex G. J. Lan ◽

Joao Martins ◽

Wesley Ramos Santos ◽

...

Keyword(s):

Natural Language ◽

Natural Language Text ◽

Language Text

Download Full-text

Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

Natural Language Engineering ◽

10.1017/s1351324920000352 ◽

2020 ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

Clément Dalloux ◽

Vincent Claveau ◽

Natalia Grabar ◽

Lucas Emanuel Silva Oliveira ◽

Claudia Maria Cabral Moro ◽

...

Keyword(s):

Machine Learning ◽

Information Extraction ◽

State Of The Art ◽

Automatic Detection ◽

Brazilian Portuguese ◽

Supervised Machine Learning ◽

Biomedical Domain ◽

Learning Approaches ◽

Cross Domain ◽

Automatic Methods

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.

Download Full-text

Remembering the News: Effects of Medium and Message Discrepancy on News Recall over Time

Journalism & Mass Communication Quarterly ◽

10.1177/107769909507200316 ◽

1995 ◽

Vol 72 (3) ◽

pp. 666-681 ◽

Cited By ~ 7

Author(s):

Robert H. Wicks

Keyword(s):

Associative Memory ◽

News Media ◽

Common Knowledge ◽

Theoretical Explanation ◽

Important Variable ◽

News Stories ◽

New Information ◽

News Effects ◽

Over Time ◽

Better Than

This article suggests a theoretical explanation of the processes related to recall and learning of media news information. It does so by linking the concepts of schematic thinking and the Search of Associative Memory (SAM) to the variable of time. It argues that learning from the news may be better than many recent studies suggest. Although humans may have trouble recalling discrete news stories in recall examinations, it seems likely that they acquire “common knowledge” from the news media. Time is an important variable in helping people to remember news if they use it to think about new information in the context of previously stored knowledge.

Download Full-text