Unsupervised DNF Blocking for Efficient Linking of Knowledge Graphs and Tables

Entity Resolution (ER) is the problem of identifying co-referent entity pairs across datasets, including knowledge graphs (KGs). ER is an important prerequisite in many applied KG search and analytics pipelines, with a typical workflow comprising two steps. In the first ’blocking’ step, entities are mapped to blocks. Blocking is necessary for preempting comparing all possible pairs of entities, as (in the second ‘similarity’ step) only entities within blocks are paired and compared, allowing for significant computational savings with a minimal loss of performance. Unfortunately, learning a blocking scheme in an unsupervised fashion is a non-trivial problem, and it has not been properly explored for heterogeneous, semi-structured datasets, such as are prevalent in industrial and Web applications. This article presents an unsupervised algorithmic pipeline for learning Disjunctive Normal Form (DNF) blocking schemes on KGs, as well as structurally heterogeneous tables that may not share a common schema. We evaluate the approach on six real-world dataset pairs, and show that it is competitive with supervised and semi-supervised baselines.

Download Full-text

Understanding Database Performance Inefficiencies in Real-world Web Applications

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM '17 ◽

10.1145/3132847.3132954 ◽

2017 ◽

Cited By ~ 13

Author(s):

Cong Yan ◽

Alvin Cheung ◽

Junwen Yang ◽

Shan Lu

Keyword(s):

Real World ◽

Web Applications ◽

Database Performance

Download Full-text

Entity Alignment between Knowledge Graphs Using Attribute Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301297 ◽

2019 ◽

Vol 33 ◽

pp. 297-304 ◽

Cited By ~ 26

Author(s):

Bayu Distiawan Trisedya ◽

Jianzhong Qi ◽

Rui Zhang

Keyword(s):

Real World ◽

Graph Embedding ◽

Knowledge Bases ◽

Knowledge Graph ◽

World Knowledge ◽

Large Numbers ◽

Proposed Model ◽

Alignment Task ◽

Transitivity Rule ◽

Knowledge Graphs

The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph embedding model that learns entity embeddings to capture the semantic similarity between entities in the same knowledge graph. We propose to learn embeddings that can capture the similarity between entities in different knowledge graphs. Our proposed model helps align entities from different knowledge graphs, and hence enables the integration of multiple knowledge graphs. Our model exploits large numbers of attribute triples existing in the knowledge graphs and generates attribute character embeddings. The attribute character embedding shifts the entity embeddings from two knowledge graphs into the same space by computing the similarity between entities based on their attributes. We use a transitivity rule to further enrich the number of attributes of an entity to enhance the attribute character embedding. Experiments using real-world knowledge bases show that our proposed model achieves consistent improvements over the baseline models by over 50% in terms of hits@1 on the entity alignment task.

Download Full-text

The shortest disjunctive normal form of a random Boolean function

Random Structures and Algorithms ◽

10.1002/rsa.10065 ◽

2003 ◽

Vol 22 (2) ◽

pp. 161-186 ◽

Cited By ~ 6

Author(s):

Nicholas Pippenger

Keyword(s):

Normal Form ◽

Boolean Function ◽

Disjunctive Normal Form

Download Full-text

Finding a short and accurate decision rule in disjunctive normal form by exhaustive search

Machine Learning ◽

10.1007/s10994-010-5168-9 ◽

2010 ◽

Vol 80 (1) ◽

pp. 33-62 ◽

Cited By ~ 7

Author(s):

Peter R. Rijnbeek ◽

Jan A. Kors

Keyword(s):

Normal Form ◽

Decision Rule ◽

Disjunctive Normal Form ◽

Exhaustive Search

Download Full-text

A Protection Mechanism against Malicious HTML and JavaScript Code in Vulnerable Web Applications

Mathematical Problems in Engineering ◽

10.1155/2016/7107042 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14

Author(s):

Shukai Liu ◽

Xuexiong Yan ◽

Qingxian Wang ◽

Xu Zhao ◽

Chuansen Chai ◽

...

Keyword(s):

High Risk ◽

Real World ◽

Web Applications ◽

Web Pages ◽

Web Browsers ◽

Test Results ◽

Code Size ◽

Protection Mechanism ◽

High Profile ◽

Content Security

The high-profile attacks of malicious HTML and JavaScript code have seen a dramatic increase in both awareness and exploitation in recent years. Unfortunately, exiting security mechanisms provide no enough protection. We propose a new protection mechanism named PMHJ based on the support of both web applications and web browsers against malicious HTML and JavaScript code in vulnerable web applications. PMHJ prevents the injection attack of HTML elements with a random attribute value and the node-split attack by an attribute with the hash value of the HTML element. PMHJ ensures the content security in web pages by verifying HTML elements, confining the insecure HTML usages which can be exploited by attackers, and disabling the JavaScript APIs which may incur injection vulnerabilities. PMHJ provides a flexible way to rein the high-risk JavaScript APIs with powerful ability according to the principle of least authority. The PMHJ policy is easy to be deployed into real-world web applications. The test results show that PMHJ has little influence on the run time and code size of web pages.

Download Full-text

On closure under direct product

Journal of Symbolic Logic ◽

10.2307/2964395 ◽

1958 ◽

Vol 23 (2) ◽

pp. 149-154 ◽

Cited By ~ 10

Author(s):

C. C. Chang ◽

Anne C. Morel

Keyword(s):

Normal Form ◽

Direct Product ◽

Disjunctive Normal Form ◽

Negative Answer ◽

Sufficient Condition ◽

Natural Question ◽

Existential Quantifier ◽

Relational Systems ◽

Image Position

In 1951, Horn obtained a sufficient condition for an arithmetical class to be closed under direct product. A natural question which arose was whether Horn's condition is also necessary. We obtain a negative answer to that question.We shall discuss relational systems of the formwhere A and R are non-empty sets; each element of R is an ordered triple 〈a, b, c〉, with a, b, c ∈ A.1 If the triple 〈a, b, c〉 belongs to the relation R, we write R(a, b, c); if 〈a, b, c〉 ∉ R, we write (a, b, c). If x0, x1 and x2 are variables, then R(x0, x1, x2) and x0 = x1 are predicates. The expressions (x0, x1, x2) and x0 ≠ x1 will be referred to as negations of predicates.We speak of α1, …, αn as terms of the disjunction α1 ∨ … ∨ αn and as factors of the conjunction α1 ∧ … ∧ αn. A sentence (open, closed or neither) of the formwhere each Qi (if there be any) is either the universal or the existential quantifier and each αi, l is either a predicate or a negation of a predicate, is said to be in prenex disjunctive normal form.

Download Full-text

New Boolean Equation for Orthogonalizing of Disjunctive Normal Form based on the Method of Orthogonalizing Difference-Building

Journal of Electronic Testing ◽

10.1007/s10836-016-5572-6 ◽

2016 ◽

Vol 32 (2) ◽

pp. 197-208 ◽

Cited By ~ 3

Author(s):

Yavuz Can ◽

Hassen Kassim ◽

Georg Fischer

Keyword(s):

Normal Form ◽

Disjunctive Normal Form ◽

Boolean Equation

Download Full-text

On the learnability of disjunctive normal form formulas

Machine Learning ◽

10.1007/bf00996269 ◽

1995 ◽

Vol 19 (3) ◽

pp. 183-208 ◽

Cited By ~ 7

Author(s):

Howard Aizenstein ◽

Leonard Pitt

Keyword(s):

Normal Form ◽

Disjunctive Normal Form

Download Full-text

Features Selection for Entity Resolution in Prostitution on Twitter

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i1.1214 ◽

2021 ◽

Vol 2 (1) ◽

pp. 53-61

Author(s):

Reisa Permatasari ◽

Nur Aini Rakhmawati

Keyword(s):

Logistic Regression ◽

Active Learning ◽

Real World ◽

Entity Resolution ◽

Training Data ◽

Features Selection ◽

Selection For ◽

Maximum Similarity

Entity resolution is the process of determining whether two references to real-world objects refer to the same or different purposes. This study applies entity resolution on Twitter prostitution dataset based on features with the Regularized Logistic Regression training and determination of Active Learning on Dedupe and based on graphs using Neo4j and Node2Vec. This study found that maximum similarity is 1 when the number of features (personal, location and bio specifications) is complete. The minimum similarity is 0.025662627 when the amount of harmful training data. The most influencing similarity feature is the cellphone number with the lowest starting range from 0.997678459 to 0.999993523. The parameter - length of walk per source has the effect of achieving the best similarity accuracy reaching 71.4% (prediction 14 and yield 10).

Download Full-text

Detection of Disjunctive Normal Form Predicate in Distributed Systems

Distributed Computing and Networking - Lecture Notes in Computer Science ◽

10.1007/978-3-540-77444-0_13 ◽

2007 ◽

pp. 158-169 ◽

Cited By ~ 1

Author(s):

Hongtao Huang

Keyword(s):

Distributed Systems ◽

Normal Form ◽

Disjunctive Normal Form

Download Full-text