SUMA: A Partial Materialization-Based Scalable Query Answering in OWL 2 DL

Data Science and Engineering ◽

10.1007/s41019-020-00150-0 ◽

2021 ◽

Author(s):

Xiaoyu Qin ◽

Xiaowang Zhang ◽

Muhammad Qasim Yasin ◽

Shujun Wang ◽

Zhiyong Feng ◽

...

Keyword(s):

Large Datasets ◽

Sparql Query ◽

Query Answering ◽

Query Rewriting ◽

Prototype System ◽

Implicit Information ◽

Ontology Language ◽

Query Engine ◽

The Cost ◽

Partial Materialization

AbstractOntology-mediated querying (OMQ) provides a paradigm for query answering according to which users not only query records at the database but also query implicit information inferred from ontology. A key challenge in OMQ is that the implicit information may be infinite, which cannot be stored at the database and queried by off -the -shelf query engine. The commonly adopted technique to deal with infinite entailments is query rewriting, which, however, comes at the cost of query rewriting at runtime. In this work, the partial materialization method is proposed to ensure that the extension is always finite. The partial materialization technology does not rewrite query but instead computes partial consequences entailed by ontology before the online query. Besides, a query analysis algorithm is designed to ensure the completeness of querying rooted and Boolean conjunctive queries over partial materialization. We also soundly and incompletely expand our method to support highly expressive ontology language, OWL 2 DL. Finally, we further optimize the materialization efficiency by role rewriting algorithm and implement our approach as a prototype system SUMA by integrating off-the-shelf efficient SPARQL query engine. The experiments show that SUMA is complete on each test ontology and each test query, which is the same as Pellet and outperforms PAGOdA. Besides, SUMA is highly scalable on large datasets.

Download Full-text

Query Rewriting for DL-Lite with n-ary Concrete Domains

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/109 ◽

2017 ◽

Cited By ~ 5

Author(s):

Franz Baader ◽

Stefan Borgwardt ◽

Marcel Lippmann

Keyword(s):

Query Answering ◽

Query Rewriting ◽

First Order ◽

Ontology Language ◽

Concrete Domains

We investigate ontology-based query answering (OBQA) in a setting where both the ontology and the query can refer to concrete values such as numbers and strings. In contrast to previous work on this topic, the built-in predicates used to compare values are not restricted to being unary. We introduce restrictions on these predicates and on the ontology language that allow us to reduce OBQA to query answering in databases using the so-called combined rewriting approach. Though at first sight our restrictions are different from the ones used in previous work, we show that our results strictly subsume some of the existing first-order rewritability results for unary predicates.

Download Full-text

Optimizing SPARQL Query Answering over OWL Ontologies

Journal of Artificial Intelligence Research ◽

10.1613/jair.3872 ◽

2013 ◽

Vol 48 ◽

pp. 253-303 ◽

Cited By ~ 21

Author(s):

I. Kollia ◽

B. Glimm

Keyword(s):

Query Language ◽

Sparql Query ◽

Query Rewriting ◽

Query Execution ◽

Subgraph Matching ◽

Role Hierarchy ◽

Dynamic Case ◽

Clustering Approach ◽

The Cost ◽

Execution Order

The SPARQL query language is currently being extended by the World Wide Web Consortium (W3C) with so-called entailment regimes. An entailment regime defines how queries are evaluated under more expressive semantics than SPARQL's standard simple entailment, which is based on subgraph matching. The queries are very expressive since variables can occur within complex concepts and can also bind to concept or role names. In this paper, we describe a sound and complete algorithm for the OWL Direct Semantics entailment regime. We further propose several novel optimizations such as strategies for determining a good query execution order, query rewriting techniques, and show how specialized OWL reasoning tasks and the concept and role hierarchy can be used to reduce the query execution time. For determining a good execution order, we propose a cost-based model, where the costs are based on information about the instances of concepts and roles that are extracted from a model abstraction built by an OWL reasoner. We present two ordering strategies: a static and a dynamic one. For the dynamic case, we improve the performance by exploiting an individual clustering approach that allows for computing the cost functions based on one individual sample from a cluster. We provide a prototypical implementation and evaluate the efficiency of the proposed optimizations. Our experimental study shows that the static ordering usually outperforms the dynamic one when accurate statistics are available. This changes, however, when the statistics are less accurate, e.g., due to nondeterministic reasoning decisions. For queries that go beyond conjunctive instance queries we observe an improvement of up to three orders of magnitude due to the proposed optimizations.

Download Full-text

Aggregation of cohorts for histopathological diagnosis with deep morphological analysis

Scientific Reports ◽

10.1038/s41598-021-82642-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jeonghyuk Park ◽

Yul Ri Chung ◽

Seo Taek Kong ◽

Yeong Won Kim ◽

Hyunho Park ◽

...

Keyword(s):

Cancer Detection ◽

Morphological Analysis ◽

Deep Neural Networks ◽

Large Datasets ◽

Histopathological Diagnosis ◽

Single Model ◽

Trade Off ◽

Detection Model ◽

Optimal Behavior ◽

The Cost

AbstractThere have been substantial efforts in using deep learning (DL) to diagnose cancer from digital images of pathology slides. Existing algorithms typically operate by training deep neural networks either specialized in specific cohorts or an aggregate of all cohorts when there are only a few images available for the target cohort. A trade-off between decreasing the number of models and their cancer detection performance was evident in our experiments with The Cancer Genomic Atlas dataset, with the former approach achieving higher performance at the cost of having to acquire large datasets from the cohort of interest. Constructing annotated datasets for individual cohorts is extremely time-consuming, with the acquisition cost of such datasets growing linearly with the number of cohorts. Another issue associated with developing cohort-specific models is the difficulty of maintenance: all cohort-specific models may need to be adjusted when a new DL algorithm is to be used, where training even a single model may require a non-negligible amount of computation, or when more data is added to some cohorts. In resolving the sub-optimal behavior of a universal cancer detection model trained on an aggregate of cohorts, we investigated how cohorts can be grouped to augment a dataset without increasing the number of models linearly with the number of cohorts. This study introduces several metrics which measure the morphological similarities between cohort pairs and demonstrates how the metrics can be used to control the trade-off between performance and the number of models.

Download Full-text

$$\textsf {GQA}_{\textsf {RDF}}$$: A Graph-Based Approach Towards Efficient SPARQL Query Answering

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-59416-9_33 ◽

2020 ◽

pp. 551-568

Author(s):

Xi Wang ◽

Qianzhen Zhang ◽

Deke Guo ◽

Xiang Zhao ◽

Jianye Yang

Keyword(s):

Sparql Query ◽

Query Answering

Download Full-text

A Robust Optimization Approach of SQL-to-SPARQL Query Rewriting

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2019.0101173 ◽

2019 ◽

Vol 10 (11) ◽

Author(s):

Ahmed Abatal ◽

Mohamed Bahaj ◽

Soussi Nassima

Keyword(s):

Robust Optimization ◽

Sparql Query ◽

Query Rewriting ◽

Optimization Approach

Download Full-text

Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies

Journal of Artificial Intelligence Research ◽

10.1613/jair.3949 ◽

2013 ◽

Vol 47 ◽

pp. 741-808 ◽

Cited By ~ 30

Author(s):

B. Cuenca Grau ◽

I. Horrocks ◽

M. Krötzsch ◽

C. Kupke ◽

D. Magka ◽

...

Keyword(s):

Knowledge Representation ◽

Sufficient Conditions ◽

Query Answering ◽

Conjunctive Queries ◽

Practical Solution ◽

Ontology Language ◽

Lower Complexity ◽

The Given ◽

Theoretical Side

Answering conjunctive queries (CQs) over a set of facts extended with existential rules is a prominent problem in knowledge representation and databases. This problem can be solved using the chase algorithm, which extends the given set of facts with fresh facts in order to satisfy the rules. If the chase terminates, then CQs can be evaluated directly in the resulting set of facts. The chase, however, does not terminate necessarily, and checking whether the chase terminates on a given set of rules and facts is undecidable. Numerous acyclicity notions were proposed as sufficient conditions for chase termination. In this paper, we present two new acyclicity notions called model-faithful acyclicity (MFA) and model-summarising acyclicity (MSA). Furthermore, we investigate the landscape of the known acyclicity notions and establish a complete taxonomy of all notions known to us. Finally, we show that MFA and MSA generalise most of these notions. Existential rules are closely related to the Horn fragments of the OWL 2 ontology language; furthermore, several prominent OWL 2 reasoners implement CQ answering by using the chase to materialise all relevant facts. In order to avoid termination problems, many of these systems handle only the OWL 2 RL profile of OWL 2; furthermore, some systems go beyond OWL 2 RL, but without any termination guarantees. In this paper we also investigate whether various acyclicity notions can provide a principled and practical solution to these problems. On the theoretical side, we show that query answering for acyclic ontologies is of lower complexity than for general ontologies. On the practical side, we show that many of the commonly used OWL 2 ontologies are MSA, and that the number of facts obtained by materialisation is not too large. Our results thus suggest that principled development of materialisation-based OWL 2 reasoners is practically feasible.

Download Full-text

Query Answering with Guarded Existential Rules under Stable Model Semantics

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5695 ◽

2020 ◽

Vol 34 (03) ◽

pp. 3017-3024

Author(s):

Hai Wan ◽

Guohui Xiao ◽

Chenglin Wang ◽

Xianqiao Liu ◽

Junhong Chen ◽

...

Keyword(s):

Answer Set Programming ◽

Query Answering ◽

Prototype System ◽

Stable Model ◽

Logic Programs ◽

Termination Problem ◽

Answer Set

In this paper, we study the problem of query answering with guarded existential rules (also called GNTGDs) under stable model semantics. Our goal is to use existing answer set programming (ASP) solvers. However, ASP solvers handle only finitely-ground logic programs while the program translated from GNTGDs by Skolemization is not in general. To address this challenge, we introduce two novel notions of (1) guarded instantiation forest to describe the instantiation of GNTGDs and (2) prime block to characterize the repeated infinitely-ground program translated from GNTGDs. Using these notions, we prove that the ground termination problem for GNTGDs is decidable. We also devise an algorithm for query answering with GNTGDs using ASP solvers. We have implemented our approach in a prototype system. The evaluation over a set of benchmarks shows encouraging results.

Download Full-text

Application of Variational Geometry to the Analysis of Mechanical Tolerances

15th Design Automation Conference: Volume 1 — Computer-Aided and Computational Design ◽

10.1115/detc1989-0016 ◽

1989 ◽

Author(s):

P. M. Martino ◽

G. A. Gabriele

Keyword(s):

Mechanical Design ◽

Worst Case Analysis ◽

Prototype System ◽

Tolerance Design ◽

Worst Case ◽

Geometric Tolerances ◽

Solid Models ◽

The Cost ◽

Selection Of

Abstract The proper selection of tolerances is an important part of mechanical design that can have a significant impact on the cost and quality of the final product. Yet, despite their importance, current techniques for tolerance design are rather primitive and often based on experience and trial and error. Better tolerance design methods have been proposed but are seldom used because of the difficulty in formulating the necessary design equations for practical problems. In this paper we propose a technique for the automatic formulation of the design equations, or design functions, which is based on the use of solid models and variational geometry. A prototype system has been developed which can model conventional and statistical tolernaces, and a limited set of geometric tolerances. The prototype system is limited to the modeling of single parts, but can perform both a worst case analysis and a statistical analysis. Results on several simple parts with known characteristics are presented which demonstrate the accuracy of the system and the types of analysis it can perform. The paper concludes with a discussion of extensions to the prototype system to a broader range of geometry and the handling of assemblies.

Download Full-text

Semantic Search Engine and Object Database Guidelines for Service Oriented Architecture Models

Technology Diffusion and Adoption ◽

10.4018/978-1-4666-2791-8.ch015 ◽

2013 ◽

pp. 225-250

Author(s):

Omar Shehab ◽

Ali Hussein Saleh Zolait

Keyword(s):

Search Engine ◽

Service Oriented Architecture ◽

Query Language ◽

Semantic Search ◽

Semantic Query ◽

Qualitative Survey ◽

Ontology Language ◽

Service Oriented ◽

The Cost ◽

Semantic Search Engine

In this paper, the authors propose a Semantic Search Engine, which retrieves software components precisely and uses techniques to store these components in a database, such as ontology technology. The engine uses semantic query language to retrieve these components semantically. The authors use an exploratory study where the proposed method is mapped between object-oriented concepts and web ontology language. A qualitative survey and interview techniques were used to collect data. The findings after implementing this research are a set of guidelines, a model, and a prototype to describe the semantic search engine system. The guidelines provided help software developers and companies reduce the cost, time, and risks of software development.

Download Full-text

Discussion

Advances in Medical Technologies and Clinical Practice - Clinical Data Mining for Physician Decision Making and Investigating Health Outcomes ◽

10.4018/978-1-61520-905-7.ch017 ◽

2010 ◽

pp. 346-351

Author(s):

Patricia Cerrito ◽

John Cerrito

Keyword(s):

Clinical Trials ◽

Linear Regression ◽

Large Datasets ◽

Cost Of Care ◽

P Value ◽

Confounding Factors ◽

Chi Square ◽

Sequential Treatments ◽

The Cost ◽

Improve Patient

Now that the data are more readily available for outcomes research and the techniques to analyze that data are available, we need to use the tools to investigate the total complexity of patient care. We should no longer rely upon basic tools while ignoring sequential treatments for patients with chronic diseases or the issue of patient compliance, and we can start investigating treatments from birth to death. It is no longer possible, with these large datasets, to rely on t-tests, chi-square statistics and simple linear regression. Without the luxury of clinical trials and randomizing patients into treatment versus control, there will always be confounding factors that should be considered in the data. In addition, large datasets almost guarantee that the p-value in a standard regression is statistically significant, so other methods of model adequacy must be used. If we do not start using outcomes data, we are missing crucial knowledge that can be used to improve patient outcomes while simultaneously reducing the cost of care. If we continue to use inferential statistical methods that were not designed to work with large datasets, we will not extract the information that is readily available in the outcomes datasets.

Download Full-text