Refining Automatically Extracted Knowledge Bases Using Crowdsourcing

Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost.

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch074 ◽

2018 ◽

pp. 1638-1664 ◽

Cited By ~ 1

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

Data-Driven Metaphor Recognition and Explanation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00235 ◽

2013 ◽

Vol 1 ◽

pp. 379-390 ◽

Cited By ~ 9

Author(s):

Hongsong Li ◽

Kenny Q. Zhu ◽

Haixun Wang

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Important Task ◽

Data Driven ◽

Web Pages ◽

Inference Mechanism ◽

Art Methods ◽

Data Driven Approach ◽

Machine Reading

Recognizing metaphors and identifying the source-target mappings is an important task as metaphorical text poses a big challenge for machine reading. To address this problem, we automatically acquire a metaphor knowledge base and an isA knowledge base from billions of web pages. Using the knowledge bases, we develop an inference mechanism to recognize and explain the metaphors in the text. To our knowledge, this is the first purely data-driven approach of probabilistic metaphor acquisition, recognition, and explanation. Our results shows that it significantly outperforms other state-of-the-art methods in recognizing and explaining metaphors.

Download Full-text

Systematic Review and Quantitative Comparison of Cyberattack Scenario Detection and Projection

Electronics ◽

10.3390/electronics9101722 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1722

Author(s):

Ivan Kovačević ◽

Stjepan Groš ◽

Karlo Slovenec

Keyword(s):

Large Scale ◽

State Of The Art ◽

Knowledge Bases ◽

Intrusion Detection Systems ◽

False Negatives ◽

Detection Systems ◽

Malicious Activity ◽

Event Logs ◽

Manual Analysis

Intrusion Detection Systems (IDSs) automatically analyze event logs and network traffic in order to detect malicious activity and policy violations. Because IDSs have a large number of false positives and false negatives and the technical nature of their alerts requires a lot of manual analysis, the researchers proposed approaches that automate the analysis of alerts to detect large-scale attacks and predict the attacker’s next steps. Unfortunately, many such approaches use unique datasets and success metrics, making comparison difficult. This survey provides an overview of the state of the art in detecting and projecting cyberattack scenarios, with a focus on evaluation and the corresponding metrics. Representative papers are collected while using Google Scholar and Scopus searches. Mutually comparable success metrics are calculated and several comparison tables are provided. Our results show that commonly used metrics are saturated on popular datasets and cannot assess the practical usability of the approaches. In addition, approaches with knowledge bases require constant maintenance, while data mining and ML approaches depend on the quality of available datasets, which, at the time of writing, are not representative enough to provide general knowledge regarding attack scenarios, so more emphasis needs to be placed on researching the behavior of attackers.

Download Full-text

Principles and practice in verifying rule-based systems

The Knowledge Engineering Review ◽

10.1017/s026988890000624x ◽

1992 ◽

Vol 7 (2) ◽

pp. 115-141 ◽

Cited By ~ 40

Author(s):

Alun D. Preece ◽

Rajjan Shinghal ◽

Aïda Batarekh

Keyword(s):

Expert System ◽

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Order Logic ◽

First Order Logic ◽

Rule Based ◽

First Order ◽

Rule Bases ◽

System Knowledge

AbstractThis paper surveys the verification of expert system knowledge bases by detecting anomalies. Such anomalies are highly indicative of errors in the knowledge base. The paper is in two parts. The first part describes four types of anomaly: redundancy, ambivalence, circularity, and deficiency. We consider rule bases which are based on first-order logic, and explain the anomalies in terms of the syntax and semantics of logic. The second part presents a review of five programs which have been built to detect various subsets of the anomalies. The four anomalies provide a framework for comparing the capabilities of the five tools, and we highlight the strengths and weaknesses of each approach. This paper therefore provides not only a set of underlying principles for performing knowledge base verification through anomaly detection, but also a survey of the state-of-the-art in building practical tools for carrying out such verification. The reader of this paper is expected to be familiar with first-order logic.

Download Full-text

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301419 ◽

2019 ◽

Vol 33 ◽

pp. 419-426 ◽

Cited By ~ 6

Author(s):

Yujin Yuan ◽

Liyuan Liu ◽

Siliang Tang ◽

Zhongfei Zhang ◽

Yueting Zhuang ◽

...

Keyword(s):

Selective Attention ◽

Supervised Learning ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Training Data ◽

Distant Supervision ◽

Sentence Level ◽

Noise Robust

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

Download Full-text

A Model Approach to Infer the Quality in Agricultural Sprayers Supported by Knowledge Bases and Experimental Measurements

International Journal of Semantic Computing ◽

10.1142/s1793351x17400104 ◽

2017 ◽

Vol 11 (03) ◽

pp. 279-292 ◽

Cited By ~ 2

Author(s):

Elmer A. G. Peñaloza ◽

Paulo E. Cruvinel ◽

Vilma A. Oliveira ◽

Augusto G. F. Costa

Keyword(s):

Decision Making ◽

Knowledge Base ◽

Experimental Tests ◽

Knowledge Bases ◽

Operating Conditions ◽

Semantic Interpretation ◽

Data Consistency ◽

Flow Measurements ◽

Spraying Process

This paper presents a method to infer the quality of sprayers based on data collection of the drop spectra and their physical descriptors, which are used to generate a knowledge base to support decision-making in agriculture. The knowledge base is formed by collected experimental data, obtained in a controlled environment under specific operating conditions, and the semantics used in the spraying process to infer the quality in the application. The electro-hydraulic operating conditions of the sprayer system, which include speed and flow measurements, are used to define experimental tests, perform calibration of the spray booms and select the nozzle types. Using the Grubbs test and the quartile-quartile plot an exploratory analysis of the collected data was made in order to determine the data consistency, the deviation of atypical values, the independence between the data of each test, the repeatability and the normal representation of them. Therefore, integrating measurements to a knowledge base it was possible to improve the decision-making in relation to the quality of the spraying process defined in terms of a distribution function. Results shown that the use of advanced models and semantic interpretation improved the decision-making processes related to the quality of the agricultural sprayers.

Download Full-text

Automatic Linking of Terms from Scientific Texts with Knowledge Base Entities

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2021-19-2-65-75 ◽

2021 ◽

Vol 19 (2) ◽

pp. 65-75

Author(s):

A. A. Mezentseva ◽

E. P. Bruches ◽

T. V. Batura

Keyword(s):

Knowledge Base ◽

Text Processing ◽

Semantic Content ◽

Knowledge Bases ◽

Scientific Article ◽

Entity Linking ◽

Scientific Publications ◽

Scientific Texts ◽

Two Stages

Due to the growth of the number of scientific publications, the tasks related to scientific article processing become more actual. Such texts have a special structure, lexical and semantic content that should be taken into account while processing. Using information from knowledge bases can significantly improve the quality of text processing systems. This paper is dedicated to the entity linking task for scientific articles in Russian, where we consider scientific terms as entities. During our work, we annotated a corpus with scientific texts, where each term was linked with an entity from a knowledge base. Also, we implemented an algorithm for entity linking and evaluated it on the corpus. The algorithm consists of two stages: candidate generation for an input term and ranking this set of candidates to choose the best match. We used string matching of an input term and an entity in a knowledge base to generate a set of candidates. To rank the candidates and choose the most relevant entity for a term, information about the number of links to other entities within the knowledge base and to other sites is used. We analyzed the obtained results and proposed possible ways to improve the quality of the algorithm, for example, using information about the context and a knowledge base structure. The annotated corpus is publicly available and can be useful for other researchers.

Download Full-text

Query Answer Reformulation over Knowledge Bases

Journal of Information and Data Management ◽

10.5753/jidm.2021.1914 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

João Pedro V. Pinheiro ◽

Marco A. Casanova ◽

Elisa S. Menendez

Keyword(s):

Knowledge Base ◽

Knowledge Bases ◽

Proper Treatment ◽

Redundant Data ◽

Query Answer

The answer of a query, submitted to a database or a knowledge base, is often long and may contain redundant data. The user is frequently forced to browse through a long answer or refine and repeat the query until the answer reaches a manageable size. Without proper treatment, consuming the answer may indeed become a tedious task. This article then proposes a process that modifies the presentation of a query answer to improve the quality of the user’s experience in the context of an RDF knowledge base. The process reorganizes the original query answer by applying heuristics to summarize the results and to select template questions that create a user dialog that guides the presentation of the results. The article also includes experiments based on RDF versions of MusicBrainz, enriched with DBpedia data, and IMDb, each with over 200 million RDF triples. The experiments use sample queries from well-known benchmarks.

Download Full-text

Using Semantics and Statistics to Turn Data into Knowledge

AI Magazine ◽

10.1609/aimag.v36i1.2568 ◽

2015 ◽

Vol 36 (1) ◽

pp. 65-74 ◽

Cited By ~ 9

Author(s):

Jay Pujara ◽

Hui Miao ◽

Lise Getoor ◽

William W. Cohen

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Relational Learning ◽

Statistical Relational Learning ◽

Knowledge Bases ◽

Knowledge Graph ◽

Learning Framework ◽

Knowledge Base Construction ◽

Order Of Magnitude ◽

Soft Logic

Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. A key problem in constructing these knowledge bases from sources like the web is overcoming the erroneous and incomplete information found in millions of candidate extractions. To solve this problem, we turn to semantics — using ontological constraints between candidate facts to eliminate errors. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.

Download Full-text

Holistic Medicine: Scientific Challenges

The Scientific World JOURNAL ◽

10.1100/tsw.2003.96 ◽

2003 ◽

Vol 3 ◽

pp. 1108-1116 ◽

Cited By ~ 7

Author(s):

Soren Ventegodt ◽

Niels Jorgen Andersen ◽

Joav Merrick

Keyword(s):

Quality Of Life ◽

Holistic Medicine ◽

Healthcare Sector ◽

Self Healing ◽

Important Research ◽

Quality Of Life Research ◽

Research Questions ◽

New Vision ◽

Research Challenge

The field of holistic medicine is in need of a scientific approach. We need holistic medicine — and we even need it to be spiritual to include the depths of human existence — but we need it to be a little less “cosmic” in order to encompass the whole human being. Many important research questions and challenges, empirical as well as theoretical, demand the attention from medical researchers. Like a number of other practitioners and researchers, our group at the Quality of Life Research Center in Denmark together with groups in Norway and Israel are trying to tackle the research challenge by using conceptual frameworks of quality of life. We have suggested that quality of life represents a third influence on health beyond the genetic and traumatic factors so far emphasized by mainstream medicine. In our clinical and research efforts, we attempt to specify what a clinician may do to help patients help themselves, by mobilizing the vast resources hidden in their subjective worlds and existence, in their hopes and dreams, and their will to live. The field of holistic medicine must be upgraded to fully integrate human consciousness, scientifically as well as philosophically. We therefore present a number of important research questions for a consciousness-based holistic medicine. New directions in healthcare are called for and we need a new vision of the future of the healthcare sector in the industrialized countries. Every person seems to have immense potentials for self-healing that we scarcely know how to mobilize. A new holistic medicine must find ways to tackle this key challenge. A healthcare system that could do that successfully would bring quality of life, health, and new ability of functioning to many people.

Download Full-text