scholarly journals Refining Automatically Extracted Knowledge Bases Using Crowdsourcing

2017 ◽  
Vol 2017 ◽  
pp. 1-17
Author(s):  
Chunhua Li ◽  
Pengpeng Zhao ◽  
Victor S. Sheng ◽  
Xuefeng Xian ◽  
Jian Wu ◽  
...  

Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost.

Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


2013 ◽  
Vol 1 ◽  
pp. 379-390 ◽  
Author(s):  
Hongsong Li ◽  
Kenny Q. Zhu ◽  
Haixun Wang

Recognizing metaphors and identifying the source-target mappings is an important task as metaphorical text poses a big challenge for machine reading. To address this problem, we automatically acquire a metaphor knowledge base and an isA knowledge base from billions of web pages. Using the knowledge bases, we develop an inference mechanism to recognize and explain the metaphors in the text. To our knowledge, this is the first purely data-driven approach of probabilistic metaphor acquisition, recognition, and explanation. Our results shows that it significantly outperforms other state-of-the-art methods in recognizing and explaining metaphors.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1722
Author(s):  
Ivan Kovačević ◽  
Stjepan Groš ◽  
Karlo Slovenec

Intrusion Detection Systems (IDSs) automatically analyze event logs and network traffic in order to detect malicious activity and policy violations. Because IDSs have a large number of false positives and false negatives and the technical nature of their alerts requires a lot of manual analysis, the researchers proposed approaches that automate the analysis of alerts to detect large-scale attacks and predict the attacker’s next steps. Unfortunately, many such approaches use unique datasets and success metrics, making comparison difficult. This survey provides an overview of the state of the art in detecting and projecting cyberattack scenarios, with a focus on evaluation and the corresponding metrics. Representative papers are collected while using Google Scholar and Scopus searches. Mutually comparable success metrics are calculated and several comparison tables are provided. Our results show that commonly used metrics are saturated on popular datasets and cannot assess the practical usability of the approaches. In addition, approaches with knowledge bases require constant maintenance, while data mining and ML approaches depend on the quality of available datasets, which, at the time of writing, are not representative enough to provide general knowledge regarding attack scenarios, so more emphasis needs to be placed on researching the behavior of attackers.


1992 ◽  
Vol 7 (2) ◽  
pp. 115-141 ◽  
Author(s):  
Alun D. Preece ◽  
Rajjan Shinghal ◽  
Aïda Batarekh

AbstractThis paper surveys the verification of expert system knowledge bases by detecting anomalies. Such anomalies are highly indicative of errors in the knowledge base. The paper is in two parts. The first part describes four types of anomaly: redundancy, ambivalence, circularity, and deficiency. We consider rule bases which are based on first-order logic, and explain the anomalies in terms of the syntax and semantics of logic. The second part presents a review of five programs which have been built to detect various subsets of the anomalies. The four anomalies provide a framework for comparing the capabilities of the five tools, and we highlight the strengths and weaknesses of each approach. This paper therefore provides not only a set of underlying principles for performing knowledge base verification through anomaly detection, but also a survey of the state-of-the-art in building practical tools for carrying out such verification. The reader of this paper is expected to be familiar with first-order logic.


Author(s):  
Yujin Yuan ◽  
Liyuan Liu ◽  
Siliang Tang ◽  
Zhongfei Zhang ◽  
Yueting Zhuang ◽  
...  

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.


2017 ◽  
Vol 11 (03) ◽  
pp. 279-292 ◽  
Author(s):  
Elmer A. G. Peñaloza ◽  
Paulo E. Cruvinel ◽  
Vilma A. Oliveira ◽  
Augusto G. F. Costa

This paper presents a method to infer the quality of sprayers based on data collection of the drop spectra and their physical descriptors, which are used to generate a knowledge base to support decision-making in agriculture. The knowledge base is formed by collected experimental data, obtained in a controlled environment under specific operating conditions, and the semantics used in the spraying process to infer the quality in the application. The electro-hydraulic operating conditions of the sprayer system, which include speed and flow measurements, are used to define experimental tests, perform calibration of the spray booms and select the nozzle types. Using the Grubbs test and the quartile-quartile plot an exploratory analysis of the collected data was made in order to determine the data consistency, the deviation of atypical values, the independence between the data of each test, the repeatability and the normal representation of them. Therefore, integrating measurements to a knowledge base it was possible to improve the decision-making in relation to the quality of the spraying process defined in terms of a distribution function. Results shown that the use of advanced models and semantic interpretation improved the decision-making processes related to the quality of the agricultural sprayers.


2021 ◽  
Vol 19 (2) ◽  
pp. 65-75
Author(s):  
A. A. Mezentseva ◽  
E. P. Bruches ◽  
T. V. Batura

Due to the growth of the number of scientific publications, the tasks related to scientific article processing become more actual. Such texts have a special structure, lexical and semantic content that should be taken into account while processing. Using information from knowledge bases can significantly improve the quality of text processing systems. This paper is dedicated to the entity linking task for scientific articles in Russian, where we consider scientific terms as entities. During our work, we annotated a corpus with scientific texts, where each term was linked with an entity from a knowledge base. Also, we implemented an algorithm for entity linking and evaluated it on the corpus. The algorithm consists of two stages: candidate generation for an input term and ranking this set of candidates to choose the best match. We used string matching of an input term and an entity in a knowledge base to generate a set of candidates. To rank the candidates and choose the most relevant entity for a term, information about the number of links to other entities within the knowledge base and to other sites is used. We analyzed the obtained results and proposed possible ways to improve the quality of the algorithm, for example, using information about the context and a knowledge base structure. The annotated corpus is publicly available and can be useful for other researchers.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
João Pedro V. Pinheiro ◽  
Marco A. Casanova ◽  
Elisa S. Menendez

The answer of a query, submitted to a database or a knowledge base, is often long and may contain redundant data. The user is frequently forced to browse through a long answer or refine and repeat the query until the answer reaches a manageable size. Without proper treatment, consuming the answer may indeed become a tedious task. This article then proposes a process that modifies the presentation of a query answer to improve the quality of the user’s experience in the context of an RDF knowledge base. The process reorganizes the original query answer by applying heuristics to summarize the results and to select template questions that create a user dialog that guides the presentation of the results. The article also includes experiments based on RDF versions of MusicBrainz, enriched with DBpedia data, and IMDb, each with over 200 million RDF triples. The experiments use sample queries from well-known benchmarks.


AI Magazine ◽  
2015 ◽  
Vol 36 (1) ◽  
pp. 65-74 ◽  
Author(s):  
Jay Pujara ◽  
Hui Miao ◽  
Lise Getoor ◽  
William W. Cohen

Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. A key problem in constructing these knowledge bases from sources like the web is overcoming the erroneous and incomplete information found in millions of candidate extractions. To solve this problem, we turn to semantics — using ontological constraints between candidate facts to eliminate errors. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.


2003 ◽  
Vol 3 ◽  
pp. 1108-1116 ◽  
Author(s):  
Soren Ventegodt ◽  
Niels Jorgen Andersen ◽  
Joav Merrick

The field of holistic medicine is in need of a scientific approach. We need holistic medicine — and we even need it to be spiritual to include the depths of human existence — but we need it to be a little less “cosmic” in order to encompass the whole human being. Many important research questions and challenges, empirical as well as theoretical, demand the attention from medical researchers. Like a number of other practitioners and researchers, our group at the Quality of Life Research Center in Denmark together with groups in Norway and Israel are trying to tackle the research challenge by using conceptual frameworks of quality of life. We have suggested that quality of life represents a third influence on health beyond the genetic and traumatic factors so far emphasized by mainstream medicine. In our clinical and research efforts, we attempt to specify what a clinician may do to help patients help themselves, by mobilizing the vast resources hidden in their subjective worlds and existence, in their hopes and dreams, and their will to live. The field of holistic medicine must be upgraded to fully integrate human consciousness, scientifically as well as philosophically. We therefore present a number of important research questions for a consciousness-based holistic medicine. New directions in healthcare are called for and we need a new vision of the future of the healthcare sector in the industrialized countries. Every person seems to have immense potentials for self-healing that we scarcely know how to mobilize. A new holistic medicine must find ways to tackle this key challenge. A healthcare system that could do that successfully would bring quality of life, health, and new ability of functioning to many people.


Sign in / Sign up

Export Citation Format

Share Document