scholarly journals Multi-language Information Extraction with Text Pattern Recognition

2021 ◽  
Author(s):  
Johannes Lindén ◽  
Tingting Zhang ◽  
Stefan Forsström ◽  
Patrik Österberg

Information extraction is a task that can extract meta-data information from text. The research in this article proposes a new information extraction algorithm called GenerateIE. The proposed algorithm identifies pairs of entities and relations described in a piece of text. The extracted meta-data is useful in many areas, but within this research the focus is to use them in news-media contexts to provide the gist of the written articles for analytics and paraphrasing of news information. GenerateIE algorithm is compared with existing state of the art algorithms with two benefits. Firstly, the GenerateIE provides the co-referenced word as the entity instead of using he, she, it, etc. which is more beneficial for knowledge graphs. Secondly GenerateIE can be applied on multiple languages without changing the algorithm itself apart from the underlying natural language text-parsing. Furthermore, the performance of GenerateIE compared with state-of-the-art algorithms is not significantly better, but it offers competitive results.

2021 ◽  
Author(s):  
Baosheng Yin ◽  
Yifei Sun

Abstract As an important part of information extraction, relationship extraction aims to extract the relationships between given entities from natural language text. On the basis of the pre-training model R-BERT, this paper proposes an entity relationship extraction method that integrates entity dependency path and pre-training model, which generates a dependency parse tree by dependency parsing, obtains the dependency path of entity pair via a given entity, and uses entity dependency path to exclude such information as modifier chunks and useless entities in sentences. This model has achieved good F1 value performance on the SemEval2010 Task 8 dataset. Experiments on dataset show that dependency parsing can provide context information for models and improve performance.


2011 ◽  
Vol 1 (1) ◽  
pp. 7 ◽  
Author(s):  
Raheel Siddiqi

Automated marking of short textual answers is a challenging task due to the difficulties involved in accurately “understanding” natural language text. However, certain purpose-built Natural Language Processing (NLP) techniques can be used for this purpose. This paper describes an NLP-based approach to automated assessment that extends an earlier approach [1] to enable the automated marking of longer answers as well as answers that are partially correct. In the extended approach, the original Question Answer Language (QAL) is augmented to support the definition of regions of text that are expected to appear in a student’s answer. In order to explain the extensions to QAL, we present worked examples based on real exam questions. The system’s ability to accurately mark longer answer texts is shown to be on a par with that of existing state-of-the-art short-answer marking systems which are not capable of marking such longer texts.  


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1407
Author(s):  
Peng Wang ◽  
Jing Zhou ◽  
Yuzhang Liu ◽  
Xingchen Zhou

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.


Semantic Web ◽  
2021 ◽  
pp. 1-16
Author(s):  
Esko Ikkala ◽  
Eero Hyvönen ◽  
Heikki Rantala ◽  
Mikko Koho

This paper presents a new software framework, Sampo-UI, for developing user interfaces for semantic portals. The goal is to provide the end-user with multiple application perspectives to Linked Data knowledge graphs, and a two-step usage cycle based on faceted search combined with ready-to-use tooling for data analysis. For the software developer, the Sampo-UI framework makes it possible to create highly customizable, user-friendly, and responsive user interfaces using current state-of-the-art JavaScript libraries and data from SPARQL endpoints, while saving substantial coding effort. Sampo-UI is published on GitHub under the open MIT License and has been utilized in several internal and external projects. The framework has been used thus far in creating six published and five forth-coming portals, mostly related to the Cultural Heritage domain, that have had tens of thousands of end-users on the Web.


Author(s):  
Matheus C. Pavan ◽  
Vitor G. Santos ◽  
Alex G. J. Lan ◽  
Joao Martins ◽  
Wesley Ramos Santos ◽  
...  

2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


1995 ◽  
Vol 72 (3) ◽  
pp. 666-681 ◽  
Author(s):  
Robert H. Wicks

This article suggests a theoretical explanation of the processes related to recall and learning of media news information. It does so by linking the concepts of schematic thinking and the Search of Associative Memory (SAM) to the variable of time. It argues that learning from the news may be better than many recent studies suggest. Although humans may have trouble recalling discrete news stories in recall examinations, it seems likely that they acquire “common knowledge” from the news media. Time is an important variable in helping people to remember news if they use it to think about new information in the context of previously stored knowledge.


2008 ◽  
Vol 96 (3) ◽  
pp. 512-531 ◽  
Author(s):  
W.M. Ahmed ◽  
S.J. Leavesley ◽  
B. Rajwa ◽  
M.N. Ayyaz ◽  
A. Ghafoor ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document