scholarly journals Unsupervised DNF Blocking for Efficient Linking of Knowledge Graphs and Tables

Information ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 134
Author(s):  
Mayank Kejriwal

Entity Resolution (ER) is the problem of identifying co-referent entity pairs across datasets, including knowledge graphs (KGs). ER is an important prerequisite in many applied KG search and analytics pipelines, with a typical workflow comprising two steps. In the first ’blocking’ step, entities are mapped to blocks. Blocking is necessary for preempting comparing all possible pairs of entities, as (in the second ‘similarity’ step) only entities within blocks are paired and compared, allowing for significant computational savings with a minimal loss of performance. Unfortunately, learning a blocking scheme in an unsupervised fashion is a non-trivial problem, and it has not been properly explored for heterogeneous, semi-structured datasets, such as are prevalent in industrial and Web applications. This article presents an unsupervised algorithmic pipeline for learning Disjunctive Normal Form (DNF) blocking schemes on KGs, as well as structurally heterogeneous tables that may not share a common schema. We evaluate the approach on six real-world dataset pairs, and show that it is competitive with supervised and semi-supervised baselines.

Author(s):  
Bayu Distiawan Trisedya ◽  
Jianzhong Qi ◽  
Rui Zhang

The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph embedding model that learns entity embeddings to capture the semantic similarity between entities in the same knowledge graph. We propose to learn embeddings that can capture the similarity between entities in different knowledge graphs. Our proposed model helps align entities from different knowledge graphs, and hence enables the integration of multiple knowledge graphs. Our model exploits large numbers of attribute triples existing in the knowledge graphs and generates attribute character embeddings. The attribute character embedding shifts the entity embeddings from two knowledge graphs into the same space by computing the similarity between entities based on their attributes. We use a transitivity rule to further enrich the number of attributes of an entity to enhance the attribute character embedding. Experiments using real-world knowledge bases show that our proposed model achieves consistent improvements over the baseline models by over 50% in terms of hits@1 on the entity alignment task.


2016 ◽  
Vol 2016 ◽  
pp. 1-14
Author(s):  
Shukai Liu ◽  
Xuexiong Yan ◽  
Qingxian Wang ◽  
Xu Zhao ◽  
Chuansen Chai ◽  
...  

The high-profile attacks of malicious HTML and JavaScript code have seen a dramatic increase in both awareness and exploitation in recent years. Unfortunately, exiting security mechanisms provide no enough protection. We propose a new protection mechanism named PMHJ based on the support of both web applications and web browsers against malicious HTML and JavaScript code in vulnerable web applications. PMHJ prevents the injection attack of HTML elements with a random attribute value and the node-split attack by an attribute with the hash value of the HTML element. PMHJ ensures the content security in web pages by verifying HTML elements, confining the insecure HTML usages which can be exploited by attackers, and disabling the JavaScript APIs which may incur injection vulnerabilities. PMHJ provides a flexible way to rein the high-risk JavaScript APIs with powerful ability according to the principle of least authority. The PMHJ policy is easy to be deployed into real-world web applications. The test results show that PMHJ has little influence on the run time and code size of web pages.


1958 ◽  
Vol 23 (2) ◽  
pp. 149-154 ◽  
Author(s):  
C. C. Chang ◽  
Anne C. Morel

In 1951, Horn obtained a sufficient condition for an arithmetical class to be closed under direct product. A natural question which arose was whether Horn's condition is also necessary. We obtain a negative answer to that question.We shall discuss relational systems of the formwhere A and R are non-empty sets; each element of R is an ordered triple 〈a, b, c〉, with a, b, c ∈ A.1 If the triple 〈a, b, c〉 belongs to the relation R, we write R(a, b, c); if 〈a, b, c〉 ∉ R, we write (a, b, c). If x0, x1 and x2 are variables, then R(x0, x1, x2) and x0 = x1 are predicates. The expressions (x0, x1, x2) and x0 ≠ x1 will be referred to as negations of predicates.We speak of α1, …, αn as terms of the disjunction α1 ∨ … ∨ αn and as factors of the conjunction α1 ∧ … ∧ αn. A sentence (open, closed or neither) of the formwhere each Qi (if there be any) is either the universal or the existential quantifier and each αi, l is either a predicate or a negation of a predicate, is said to be in prenex disjunctive normal form.


1995 ◽  
Vol 19 (3) ◽  
pp. 183-208 ◽  
Author(s):  
Howard Aizenstein ◽  
Leonard Pitt

Author(s):  
Reisa Permatasari ◽  
Nur Aini Rakhmawati

Entity resolution is the process of determining whether two references to real-world objects refer to the same or different purposes. This study applies entity resolution on Twitter prostitution dataset based on features with the Regularized Logistic Regression training and determination of Active Learning on Dedupe and based on graphs using Neo4j and Node2Vec. This study found that maximum similarity is 1 when the number of features (personal, location and bio specifications) is complete. The minimum similarity is 0.025662627 when the amount of harmful training data. The most influencing similarity feature is the cellphone number with the lowest starting range from 0.997678459 to 0.999993523.  The parameter - length of walk per source has the effect of achieving the best similarity accuracy reaching 71.4% (prediction 14 and yield 10).


Sign in / Sign up

Export Citation Format

Share Document