Collective List-only Entity Linking: A Graph-based Approach

Story generation, namely, generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we use multi-task learning, which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

Download Full-text

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature

Journal of Cheminformatics ◽

10.1186/s13321-020-00461-4 ◽

2020 ◽

Vol 12 (1) ◽

Author(s):

Pedro Ruas ◽

Andre Lamurias ◽

Francisco M. Couto

Keyword(s):

Digital Libraries ◽

Information Overload ◽

Relation Extraction ◽

Knowledge Bases ◽

Entity Linking ◽

Personalized Pagerank ◽

Named Entity ◽

Manual Curation ◽

Low Performance ◽

Gold Standards

Abstract Background Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. Findings This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. Conclusions We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.

Download Full-text

Semantic analysis of Twitter content

10.32920/ryerson.14656917 ◽

2021 ◽

Author(s):

Yue Feng

Keyword(s):

Social Networks ◽

Semantic Analysis ◽

State Of The Art ◽

Semantic Annotation ◽

Semantic Relatedness ◽

Entity Linking ◽

Knowledge Resources ◽

Plain Text ◽

Analysis Techniques ◽

Structured Knowledge

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.

Download Full-text

The state of the art in semantic relatedness: a framework for comparison

The Knowledge Engineering Review ◽

10.1017/s0269888917000029 ◽

2017 ◽

Vol 32 ◽

Cited By ~ 12

Author(s):

Yue Feng ◽

Ebrahim Bagheri ◽

Faezeh Ensan ◽

Jelena Jovanovic

Keyword(s):

State Of The Art ◽

Semantic Relatedness ◽

Knowledge Bases ◽

The State ◽

Future Research ◽

Research Directions ◽

Knowledge Resources ◽

Future Research Directions ◽

The Relationship ◽

Sr Method

AbstractSemantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.

Download Full-text

Semantic analysis of Twitter content

10.32920/ryerson.14656917.v1 ◽

2021 ◽

Author(s):

Yue Feng

Keyword(s):

Social Networks ◽

Semantic Analysis ◽

State Of The Art ◽

Semantic Annotation ◽

Semantic Relatedness ◽

Entity Linking ◽

Knowledge Resources ◽

Plain Text ◽

Analysis Techniques ◽

Structured Knowledge

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.

Download Full-text

Efficient and High-Quality Seeded Graph Matching: Employing Higher-order Structural Information

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442340 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-31

Author(s):

Haida Zhang ◽

Zengfeng Huang ◽

Xuemin Lin ◽

Zhe Lin ◽

Wenjie Zhang ◽

...

Keyword(s):

Large Scale ◽

Graph Matching ◽

Structural Information ◽

Experimental Studies ◽

Higher Order ◽

Personalized Pagerank ◽

Matching Accuracy ◽

Approximation Techniques ◽

Order Of Magnitude ◽

Matching Score

Driven by many real applications, we study the problem of seeded graph matching. Given two graphs and , and a small set of pre-matched node pairs where and , the problem is to identify a matching between and growing from , such that each pair in the matching corresponds to the same underlying entity. Recent studies on efficient and effective seeded graph matching have drawn a great deal of attention and many popular methods are largely based on exploring the similarity between local structures to identify matching pairs. While these recent techniques work provably well on random graphs, their accuracy is low over many real networks. In this work, we propose to utilize higher-order neighboring information to improve the matching accuracy and efficiency. As a result, a new framework of seeded graph matching is proposed, which employs Personalized PageRank (PPR) to quantify the matching score of each node pair. To further boost the matching accuracy, we propose a novel postponing strategy, which postpones the selection of pairs that have competitors with similar matching scores. We show that the postpone strategy indeed significantly improves the matching accuracy. To improve the scalability of matching large graphs, we also propose efficient approximation techniques based on algorithms for computing PPR heavy hitters. Our comprehensive experimental studies on large-scale real datasets demonstrate that, compared with state-of-the-art approaches, our framework not only increases the precision and recall both by a significant margin but also achieves speed-up up to more than one order of magnitude.

Download Full-text

Film thickness in elastohydrodynamically lubricated slender elliptic contacts: Part I – numerical studies of central film thickness

Proceedings of the Institution of Mechanical Engineers Part J Journal of Engineering Tribology ◽

10.1177/13506501211047756 ◽

2021 ◽

pp. 135065012110477

Author(s):

Marius Wolf ◽

Sergey Solovyev ◽

Fatemi Arshia

Keyword(s):

Film Thickness ◽

State Of The Art ◽

Experimental Studies ◽

Elastohydrodynamic Lubrication ◽

Large Parameter ◽

Numerical Studies ◽

Minimum Film Thickness ◽

Subsequent Publication ◽

New Formula ◽

Increasing Load

In this paper, analytical equations for the central film thickness in slender elliptic contacts are investigated. A comparison of state-of-the-art formulas with simulation results of a multilevel elastohydrodynamic lubrication solver is conducted and shows considerable deviation. Therefore, a new film thickness formula for slender elliptic contacts with variable ellipticity is derived. It incorporates asymptotic solutions, which results in validity over a large parameter domain. It captures the behaviour of increasing film thickness with increasing load for specific very slender contacts. The new formula proves to be significantly more accurate than current equations. Experimental studies and discussions on minimum film thickness will be presented in a subsequent publication.

Download Full-text

A Semantic Matching Strategy for Very Large Knowledge Bases Integration

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2020040101 ◽

2020 ◽

Vol 15 (2) ◽

pp. 1-29

Author(s):

Antonio M. Rinaldi ◽

Cristiano Russo ◽

Kurosh Madani

Keyword(s):

State Of The Art ◽

Knowledge Bases ◽

Data Reuse ◽

Semantic Matching ◽

Formal Ontologies ◽

Upper Level ◽

Analyse Data ◽

Matching Strategy ◽

Definition Of ◽

Matching Techniques

Over the last few decades, data has assumed a central role, becoming one of the most valuable items in society. The exponential increase of several dimensions of data, e.g. volume, velocity, variety, veracity, and value, has led the definition of novel methodologies and techniques to represent, manage, and analyse data. In this context, many efforts have been devoted in data reuse and integration processes based on the semantic web approach. According to this vision, people are encouraged to share their data using standard common formats to allow more accurate interconnection and integration processes. In this article, the authors propose an ontology matching framework using novel combinations of semantic matching techniques to find accurate mappings between formal ontologies schemas. Moreover, an upper-level ontology is used as a semantic bridge. An implementation of the proposed framework is able to retrieve, match, and align ontologies. The framework has been evaluated with the state-of-the-art ontologies in the domain of cultural heritage and its performances have been measured by means of standard measures.

Download Full-text

Network Cleansing

Biological Data Mining in Protein Interaction Networks ◽

10.4018/978-1-60566-398-2.ch006 ◽

2009 ◽

pp. 80-97 ◽

Cited By ~ 2

Author(s):

Paolo Marcatili ◽

Anna Tramontano

Keyword(s):

Experimental Data ◽

Computational Methods ◽

Gold Standard ◽

State Of The Art ◽

Experimental Studies ◽

The State ◽

Biological Information ◽

Ppi Network

This chapter provides an overview of the current computational methods for PPI network cleansing. The authors first present the issue of identifying reliable PPIs from noisy and incomplete experimental data. Next, they address the questions of which are the expected results of the different experimental studies, of what can be defined as true interactions, of which kind of data are to be integrated in assigning reliability levels to PPIs and which gold standard should the authors use in training and testing PPI filtering methods. Finally, Marcatili and Tramontano describe the state of the art in the field, presenting the different classes of algorithms and comparing their results. The aim of the chapter is to guide the reader in the choice of the most convenient methods, experiments and integrative data and to underline the most common biases and errors to obtain a portrait of PINs which is not only reliable but as well able to correctly retrieve the biological information contained in such data.

Download Full-text

Experimental Study and Analysis of Different Air-Injecting Segment on the Separation Performance of Air-Injected De-Oiling Hydrocyclone

Volume 4: Ocean Engineering; Offshore Renewable Energy ◽

10.1115/omae2008-57966 ◽

2008 ◽

Author(s):

Minghu Jiang ◽

Dehai Chen ◽

Lixin Zhao ◽

Liying Sun

Keyword(s):

Experimental Study ◽

Theoretical Analysis ◽

Separation Efficiency ◽

State Of The Art ◽

Experimental Studies ◽

Separation Performance ◽

Research Results ◽

Compound Body ◽

Oil Gas

Developing state-of-the-art and separating principle of deoiling hydrocyclones are introduced. By theoretical analysis, the ways to enhance hydrocyclone’s separation efficiency are described. One way is to inject air into the hydrocyclones so as to combine with oil to form oil-gas compound body, and then increase de-oiling efficiency. By means of injecting air into large cone segment, or fine cone segment of the hydrocyclone, experiments were carried out. It is found that the best injecting part is fine cone segment. Further experimental studies were continued for confirming detail part in fine cone segment, which includes one-third segment and two-thirds segment for the sake of research. Results show that the best air-injecting part is the first one-third segment of fine cone segment. This conclusion would be useful for understanding of air-injected de-oiling hydrocyclone’s separating process, and for its design and applications.

Download Full-text

Collective List-only Entity Linking: A Graph-based Approach

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature

Semantic analysis of Twitter content

The state of the art in semantic relatedness: a framework for comparison

Semantic analysis of Twitter content

Efficient and High-Quality Seeded Graph Matching: Employing Higher-order Structural Information

Film thickness in elastohydrodynamically lubricated slender elliptic contacts: Part I – numerical studies of central film thickness

A Semantic Matching Strategy for Very Large Knowledge Bases Integration

Network Cleansing

Experimental Study and Analysis of Different Air-Injecting Segment on the Separation Performance of Air-Injected De-Oiling Hydrocyclone

Export Citation Format