Query Answer Reformulation over Knowledge Bases

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

Refining Automatically Extracted Knowledge Bases Using Crowdsourcing

Computational Intelligence and Neuroscience ◽

10.1155/2017/4092135 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17

Author(s):

Chunhua Li ◽

Pengpeng Zhao ◽

Victor S. Sheng ◽

Xuefeng Xian ◽

Jian Wu ◽

...

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Important Research ◽

Semantic Constraints ◽

Knowledge Base Refinement ◽

Automatic Methods ◽

Automated Algorithms ◽

Research Challenge

Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost.

Download Full-text

A Model Approach to Infer the Quality in Agricultural Sprayers Supported by Knowledge Bases and Experimental Measurements

International Journal of Semantic Computing ◽

10.1142/s1793351x17400104 ◽

2017 ◽

Vol 11 (03) ◽

pp. 279-292 ◽

Cited By ~ 2

Author(s):

Elmer A. G. Peñaloza ◽

Paulo E. Cruvinel ◽

Vilma A. Oliveira ◽

Augusto G. F. Costa

Keyword(s):

Decision Making ◽

Knowledge Base ◽

Experimental Tests ◽

Knowledge Bases ◽

Operating Conditions ◽

Semantic Interpretation ◽

Data Consistency ◽

Flow Measurements ◽

Spraying Process

This paper presents a method to infer the quality of sprayers based on data collection of the drop spectra and their physical descriptors, which are used to generate a knowledge base to support decision-making in agriculture. The knowledge base is formed by collected experimental data, obtained in a controlled environment under specific operating conditions, and the semantics used in the spraying process to infer the quality in the application. The electro-hydraulic operating conditions of the sprayer system, which include speed and flow measurements, are used to define experimental tests, perform calibration of the spray booms and select the nozzle types. Using the Grubbs test and the quartile-quartile plot an exploratory analysis of the collected data was made in order to determine the data consistency, the deviation of atypical values, the independence between the data of each test, the repeatability and the normal representation of them. Therefore, integrating measurements to a knowledge base it was possible to improve the decision-making in relation to the quality of the spraying process defined in terms of a distribution function. Results shown that the use of advanced models and semantic interpretation improved the decision-making processes related to the quality of the agricultural sprayers.

Download Full-text

Automatic Linking of Terms from Scientific Texts with Knowledge Base Entities

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2021-19-2-65-75 ◽

2021 ◽

Vol 19 (2) ◽

pp. 65-75

Author(s):

A. A. Mezentseva ◽

E. P. Bruches ◽

T. V. Batura

Keyword(s):

Knowledge Base ◽

Text Processing ◽

Semantic Content ◽

Knowledge Bases ◽

Scientific Article ◽

Entity Linking ◽

Scientific Publications ◽

Scientific Texts ◽

Two Stages

Due to the growth of the number of scientific publications, the tasks related to scientific article processing become more actual. Such texts have a special structure, lexical and semantic content that should be taken into account while processing. Using information from knowledge bases can significantly improve the quality of text processing systems. This paper is dedicated to the entity linking task for scientific articles in Russian, where we consider scientific terms as entities. During our work, we annotated a corpus with scientific texts, where each term was linked with an entity from a knowledge base. Also, we implemented an algorithm for entity linking and evaluated it on the corpus. The algorithm consists of two stages: candidate generation for an input term and ranking this set of candidates to choose the best match. We used string matching of an input term and an entity in a knowledge base to generate a set of candidates. To rank the candidates and choose the most relevant entity for a term, information about the number of links to other entities within the knowledge base and to other sites is used. We analyzed the obtained results and proposed possible ways to improve the quality of the algorithm, for example, using information about the context and a knowledge base structure. The annotated corpus is publicly available and can be useful for other researchers.

Download Full-text

Teisinių žinių bazių ir teisinių dokumentų izomorfizmo realizavimo metodas

Informacijos mokslai ◽

10.15388/im.2008.0.3437 ◽

2008 ◽

Vol 42 (43) ◽

pp. 91-97

Author(s):

Laima Paliulionienė

Keyword(s):

Knowledge Base ◽

Knowledge Bases ◽

Structural Elements ◽

Structural Units ◽

Legal Documents ◽

Xml Documents ◽

Legal Texts ◽

Legal Document ◽

Computer Based

Dirbtinio intelekto ir kitų kompiuterinių technologijų naudojimas rengiant teisės aktus pagerina jų kokybę ir sutrumpina rengimo laiką. Viena iš problemų, su kuriomus susiduriama formalizuojant teisės aktus, yra izomorfizmo problema. Tai teisinių žinių bazės fragmentų susiejimo su atitinkamais teisinių dokumentų struktūriniais elementais problema. Straipsnyje siūlomas teisės akto tekstų ir žinių bazių vaizdavimo būdas, užtikrinantis izomorfizmą tarp jų. Struktūruotam dokumento tekstui saugotinaudojamas XML dokumentas, o žinių bazei – F-logikos formalizmas. Be dokumento teksto ir žinių bazės, nagrinėjamas dar vienas izomorfizmo aspektas – testai, aprašantys realias arba hipotetines situacijas ir skirti patikrinti, ar teisės akto straipsnis adekvačiai apibrėžia jo pageidaujamą taikymą. Siūlomas metodas palengvina teisės akto ir žinių bazės pakeitimų valdymą ir sudaro prielaidas generuoti kokybiškesnius žinių bazėje atliekamų išvedimų paaiškinimus.A method of implementing isomorphism between legal knowledge bases and legal documentsLaima Paliulionienė SummaryThe use of artificial intelligence methods or other computer-based methods in legal drafting can improve the quality of legal documents. During the formalization of legal documents, a problem of isomorphism arises. Isomorphism can be defined as the well defined correspondence of the knowledge base elements to the structural elements of source texts. A method of the representation of legal texts and knowledge bases is proposed in this paper to ensure the isomorphism. XML documents are used to store texts of a legal document, and F-logic is used as a formalism for the knowledge base. Additionally to the document text and knowledge base, one more aspect of the isomorphism is included – tests that describe hypothetic or real situations, and are intended to check the adequacy of possible applications of structural units (articles) of the legal document. The proposed method is designed to simplify the management of the changes in legal documents and appropriate knowledge bases, and to generate explanations of the inference in knowledge bases.

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2014040104 ◽

2014 ◽

Vol 10 (2) ◽

pp. 63-86 ◽

Cited By ~ 74

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

Concept Grounding to Multiple Knowledge Bases via Indirect Supervision

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00089 ◽

2016 ◽

Vol 4 ◽

pp. 141-154 ◽

Cited By ~ 5

Author(s):

Chen-Tse Tsai ◽

Dan Roth

Keyword(s):

Knowledge Base ◽

Structural Information ◽

Knowledge Bases ◽

Biomedical Domain ◽

Algorithmic Approach ◽

Ranking Model ◽

The Rich ◽

Supervised Ranking ◽

Indirect Supervision

We consider the problem of disambiguating concept mentions appearing in documents and grounding them in multiple knowledge bases, where each knowledge base addresses some aspects of the domain. This problem poses a few additional challenges beyond those addressed in the popular Wikification problem. Key among them is that most knowledge bases do not contain the rich textual and structural information Wikipedia does; consequently, the main supervision signal used to train Wikification rankers does not exist anymore. In this work we develop an algorithmic approach that, by carefully examining the relations between various related knowledge bases, generates an indirect supervision signal it uses to train a ranking model that accurately chooses knowledge base entries for a given mention; moreover, it also induces prior knowledge that can be used to support a global coherent mapping of all the concepts in a given document to the knowledge bases. Using the biomedical domain as our application, we show that our indirectly supervised ranking model outperforms other unsupervised baselines and that the quality of this indirect supervision scheme is very close to a supervised model. We also show that considering multiple knowledge bases together has an advantage over grounding concepts to each knowledge base individually.

Download Full-text

Methods and Technologies for Quality Assurance of Intelligent Decision-Making Systems

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.12.189-199 ◽

2021 ◽

Vol 12 (4) ◽

pp. 189-199

Author(s):

O. N. Dolinina ◽

◽

V. A. Kushnikov ◽

Keyword(s):

Decision Making ◽

Quality Assurance ◽

Comparative Analysis ◽

Knowledge Base ◽

Action Plan ◽

Knowledge Bases ◽

Intelligent Decision Making ◽

Correct Knowledge ◽

Intelligent Decision

An increase in the degree of intellectualization of tasks requires the creation of methodology for improving the quality of intelligent decision-making systems. The possibility of automating decision-making in poorly formalized areas through the using of the expert knowledge leads to increasing of the number of errors in the software, and as a consequence to increasing of the number of various sources of failures.The article provides a detailed overview of existing methods and technologies for quality assurance of intelligent decision systems. The first part of the article describes the methodology for ensuring the quality of the intelligent systems (IS), based on the GOST/ ISO standards, where it is proposed to use a multilevel model to describe the quality of the IS software. It is shown that to ensure the required level of quality, an action plan can be formed and the use of a system dynamics model for the implementation of an action plan for ensuring the quality of IS is described. A comparative analysis of the complex criteria of quality and reliability is given. In the second part, the quality of knowledge base (KB) as a special element of the IS software is described, a comparative analysis of methods for static and dynamic analysis of knowledge bases is considered. An overview of research results in the classification of errors in the knowledge bases and their debugging is given. Special attention is given to the "forgetting about exception" type of errors. The concept of a statically correct knowledge base at the level of the knowledge structure is described and it is shown that statically correct knowledge bases can nevertheless give errors due to errors in the rules themselves because of the inconsistency of the field of studies. Neural network knowledge bases are allocated in a separate class, for neural networks methods of debugging are described.

Download Full-text

Methods of parallel computing for multilevel fuzzy Takagi – Sugeno systems

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2016.02-03.141 ◽

2016 ◽

pp. 141-149

Author(s):

S.V. Yershov ◽

◽

R.М. Ponomarenko ◽

Keyword(s):

Dynamic Models ◽

Fuzzy Inference ◽

Knowledge Bases ◽

Comparative Characteristic ◽

Software Systems ◽

Scientific Papers ◽

Speed Up ◽

Diagnostic Software ◽

Takagi Sugeno

Parallel tiered and dynamic models of the fuzzy inference in expert-diagnostic software systems are considered, which knowledge bases are based on fuzzy rules. Tiered parallel and dynamic fuzzy inference procedures are developed that allow speed up of computations in the software system for evaluating the quality of scientific papers. Evaluations of the effectiveness of parallel tiered and dynamic schemes of computations are constructed with complex dependency graph between blocks of fuzzy Takagi – Sugeno rules. Comparative characteristic of the efficacy of parallel-stacked and dynamic models is carried out.

Download Full-text

Peculiarities of forming a knowledge base for assessing the quality of the geo resource potential of coal mines

MINING INFORMATIONAL AND ANALYTICAL BULLETIN ◽

10.25018/0236-1493-2019-12-50-19-25 ◽

2019 ◽

Vol 12 (50) ◽

pp. 19-25

Author(s):

I.G. Vetrova ◽

Keyword(s):

Knowledge Base ◽

Coal Mines ◽

Resource Potential

Download Full-text