scholarly journals Collective List-only Entity Linking: A Graph-based Approach

Author(s):  
Weixin Zeng ◽  
Xiang Zhao ◽  
Jiuyang Tang

List-only entity linking is the task of mapping ambiguous mentions in texts to target entities in a group of entity lists. Different from traditional entity linking task, which leverages rich semantic relatedness in knowledge bases to improve linking accuracy, list-only entity linking can merely take advantage of co-occurrences information in entity lists. State-of-the-art work utilizes co-occurrences information to enrich entity descriptions, which are further used to calculate local compatibility between mentions and entities to determine results. Nonetheless, entity coherence is also deemed to play an important part in entity linking, which is yet currently neglected. In this work, in addition to local compatibility, we take into account global coherence among entities. Specifically, we propose to harness co-occurrences in entity lists for mining both explicit and implicit entity relations. The relations are then integrated into an entity graph, on which Personalized PageRank is incorporated to compute entity coherence. The final results are derived by combining local mention-entity similarity and global entity coherence. The experimental studies validate the superiority of our method. Our proposal not only improves the performance of list-only entity linking, but also opens up the bridge between list-only entity linking and conventional entity linking solutions.

Author(s):  
Jian Guan ◽  
Fei Huang ◽  
Zhihao Zhao ◽  
Xiaoyan Zhu ◽  
Minlie Huang

Story generation, namely, generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we use multi-task learning, which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Pedro Ruas ◽  
Andre Lamurias ◽  
Francisco M. Couto

Abstract Background Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. Findings This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. Conclusions We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.


2021 ◽  
Author(s):  
Yue Feng

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.


Author(s):  
Yue Feng ◽  
Ebrahim Bagheri ◽  
Faezeh Ensan ◽  
Jelena Jovanovic

AbstractSemantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.


2021 ◽  
Author(s):  
Yue Feng

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-31
Author(s):  
Haida Zhang ◽  
Zengfeng Huang ◽  
Xuemin Lin ◽  
Zhe Lin ◽  
Wenjie Zhang ◽  
...  

Driven by many real applications, we study the problem of seeded graph matching. Given two graphs and , and a small set of pre-matched node pairs where and , the problem is to identify a matching between and growing from , such that each pair in the matching corresponds to the same underlying entity. Recent studies on efficient and effective seeded graph matching have drawn a great deal of attention and many popular methods are largely based on exploring the similarity between local structures to identify matching pairs. While these recent techniques work provably well on random graphs, their accuracy is low over many real networks. In this work, we propose to utilize higher-order neighboring information to improve the matching accuracy and efficiency. As a result, a new framework of seeded graph matching is proposed, which employs Personalized PageRank (PPR) to quantify the matching score of each node pair. To further boost the matching accuracy, we propose a novel postponing strategy, which postpones the selection of pairs that have competitors with similar matching scores. We show that the postpone strategy indeed significantly improves the matching accuracy. To improve the scalability of matching large graphs, we also propose efficient approximation techniques based on algorithms for computing PPR heavy hitters. Our comprehensive experimental studies on large-scale real datasets demonstrate that, compared with state-of-the-art approaches, our framework not only increases the precision and recall both by a significant margin but also achieves speed-up up to more than one order of magnitude.


Author(s):  
Marius Wolf ◽  
Sergey Solovyev ◽  
Fatemi Arshia

In this paper, analytical equations for the central film thickness in slender elliptic contacts are investigated. A comparison of state-of-the-art formulas with simulation results of a multilevel elastohydrodynamic lubrication solver is conducted and shows considerable deviation. Therefore, a new film thickness formula for slender elliptic contacts with variable ellipticity is derived. It incorporates asymptotic solutions, which results in validity over a large parameter domain. It captures the behaviour of increasing film thickness with increasing load for specific very slender contacts. The new formula proves to be significantly more accurate than current equations. Experimental studies and discussions on minimum film thickness will be presented in a subsequent publication.


Author(s):  
Antonio M. Rinaldi ◽  
Cristiano Russo ◽  
Kurosh Madani

Over the last few decades, data has assumed a central role, becoming one of the most valuable items in society. The exponential increase of several dimensions of data, e.g. volume, velocity, variety, veracity, and value, has led the definition of novel methodologies and techniques to represent, manage, and analyse data. In this context, many efforts have been devoted in data reuse and integration processes based on the semantic web approach. According to this vision, people are encouraged to share their data using standard common formats to allow more accurate interconnection and integration processes. In this article, the authors propose an ontology matching framework using novel combinations of semantic matching techniques to find accurate mappings between formal ontologies schemas. Moreover, an upper-level ontology is used as a semantic bridge. An implementation of the proposed framework is able to retrieve, match, and align ontologies. The framework has been evaluated with the state-of-the-art ontologies in the domain of cultural heritage and its performances have been measured by means of standard measures.


Author(s):  
Paolo Marcatili ◽  
Anna Tramontano

This chapter provides an overview of the current computational methods for PPI network cleansing. The authors first present the issue of identifying reliable PPIs from noisy and incomplete experimental data. Next, they address the questions of which are the expected results of the different experimental studies, of what can be defined as true interactions, of which kind of data are to be integrated in assigning reliability levels to PPIs and which gold standard should the authors use in training and testing PPI filtering methods. Finally, Marcatili and Tramontano describe the state of the art in the field, presenting the different classes of algorithms and comparing their results. The aim of the chapter is to guide the reader in the choice of the most convenient methods, experiments and integrative data and to underline the most common biases and errors to obtain a portrait of PINs which is not only reliable but as well able to correctly retrieve the biological information contained in such data.


Author(s):  
Minghu Jiang ◽  
Dehai Chen ◽  
Lixin Zhao ◽  
Liying Sun

Developing state-of-the-art and separating principle of deoiling hydrocyclones are introduced. By theoretical analysis, the ways to enhance hydrocyclone’s separation efficiency are described. One way is to inject air into the hydrocyclones so as to combine with oil to form oil-gas compound body, and then increase de-oiling efficiency. By means of injecting air into large cone segment, or fine cone segment of the hydrocyclone, experiments were carried out. It is found that the best injecting part is fine cone segment. Further experimental studies were continued for confirming detail part in fine cone segment, which includes one-third segment and two-thirds segment for the sake of research. Results show that the best air-injecting part is the first one-third segment of fine cone segment. This conclusion would be useful for understanding of air-injected de-oiling hydrocyclone’s separating process, and for its design and applications.


Sign in / Sign up

Export Citation Format

Share Document