scholarly journals Most Probable Explanations for Probabilistic Database Queries

Author(s):  
İsmail İlkan Ceylan ◽  
Stefan Borgwardt ◽  
Thomas Lukasiewicz

Forming the foundations of large-scale knowledge bases, probabilistic databases have been widely studied in the literature. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information encompassed in large-scale knowledge bases. To exploit this potential, we study two inference tasks; namely finding the most probable database and the most probable hypothesis for a given query. As natural counterparts of most probable explanations (MPE) and maximum a posteriori hypotheses (MAP) in probabilistic graphical models, they can be used in a variety of applications that involve prediction or diagnosis tasks. We investigate these problems relative to a variety of query languages, ranging from conjunctive queries to ontology-mediated queries, and provide a detailed complexity analysis.

Author(s):  
Ismail Ilkan Ceylan ◽  
Adnan Darwiche ◽  
Guy Van den Broeck

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic database semantics, which relaxes the probabilities of open facts to default intervals. For this open-world setting, we lift the existing data complexity dichotomy of probabilistic databases, and propose an efficient evaluation algorithm for unions of conjunctive queries. We also show that query evaluation can become harder for non-monotone queries.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255562
Author(s):  
Eman Khashan ◽  
Ali Eldesouky ◽  
Sally Elghamrawy

The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others’ efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios’ benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems.


Author(s):  
Stefan Borgwardt ◽  
İsmail İlkan Ceylan ◽  
Thomas Lukasiewicz

Large-scale knowledge bases are at the heart of modern information systems. Their knowledge is inherently uncertain, and hence they are often materialized as probabilistic databases. However, probabilistic database management systems typically lack the capability to incorporate implicit background knowledge and, consequently, fail to capture some intuitive query answers. Ontology-mediated query answering is a popular paradigm for encoding commonsense knowledge, which can provide more complete answers to user queries. We propose a new data model that integrates the paradigm of ontology-mediated query answering with probabilistic databases, employing a log-linear probability model. We compare our approach to existing proposals, and provide supporting computational results.


Author(s):  
Tal Friedman ◽  
Guy Van den Broeck

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently answering many interesting queries. Recent work on open-world probabilistic databases strengthens the semantics of these probabilistic databases by discarding the assumption that any information not present in the data must be false. While intuitive, these semantics are not sufficiently precise to give reasonable answers to queries. We propose overcoming these issues by using constraints to restrict this open world. We provide an algorithm for one class of queries, and establish a basic hardness result for another. Finally, we propose an efficient and tight approximation for a large class of queries. 


2015 ◽  
Author(s):  
Stefan Borgwardt ◽  
Marcel Lippmann ◽  
Veronika Thost

2021 ◽  
Vol 11 (9) ◽  
pp. 3754
Author(s):  
René Reiss ◽  
Frank Hauser ◽  
Sven Ehlert ◽  
Michael Pütz ◽  
Ralf Zimmermann

While fast and reliable analytical results are crucial for first responders to make adequate decisions, these can be difficult to establish, especially at large-scale clandestine laboratories. To overcome this issue, multiple techniques at different levels of complexity are available. In addition to the level of complexity their information value differs as well. Within this publication, a comparison between three techniques that can be applied for on-site analysis is performed. These techniques range from ones with a simple yes or no response to sophisticated ones that allows to receive complex information about a sample. The three evaluated techniques are immunoassay drug tests representing easy to handle and fast to explain systems, ion mobility spectrometry as state-of-the-art equipment that needs training and experience prior to use and ambient pressure laser desorption with the need for a highly skilled operator as possible future technique that is currently under development. In addition to the measurement of validation parameters, real case samples are investigated to obtain practically relevant information about the capabilities and limitations of these techniques for on-site operations. Results demonstrate that in general all techniques deliver valid results, but the bandwidth of information widely varies between the investigated techniques.


2011 ◽  
Vol 19 (4) ◽  
pp. 781-794 ◽  
Author(s):  
Jeong Euy Park ◽  
Chern-En Chiang ◽  
Muhammad Munawar ◽  
Gia Khai Pham ◽  
Apichard Sukonthasarn ◽  
...  

Background: Treatment of hypercholesterolaemia in Asia is rarely evaluated on a large scale, and data on treatment outcome are scarce. The Pan-Asian CEPHEUS study aimed to assess low-density lipoprotein cholesterol (LDL-C) goal attainment among patients on lipid-lowering therapy. Methods: This survey was conducted in eight Asian countries. Hypercholesterolaemic patients aged ≥18 years who had been on lipid-lowering treatment for ≥3 months (stable medication for ≥6 weeks) were recruited, and lipid concentrations were measured. Demographic and other clinically relevant information were collected, and the cardiovascular risk of each patient was determined. Definitions and criteria set by the updated 2004 National Cholesterol Education Program guidelines were applied. Results: In this survey, 501 physicians enrolled 8064 patients, of whom 7281 were included in the final analysis. The mean age was 61.0 years, 44.4% were female, and 85.1% were on statin monotherapy. LDL-C goal attainment was reported in 49.1% of patients overall, including 51.2% of primary and 48.7% of secondary prevention patients, and 36.6% of patients with familial hypercholesterolaemia. The LDL-C goal was attained in 75.4% of moderate risk, 55.4% of high risk, and only 34.9% of very high-risk patients. Goal attainment was directly related to age and inversely related to cardiovascular risk and baseline LDL-C. Conclusion: A large proportion of Asian hypercholesterolaemic patients on lipid-lowering drugs are not at recommended LDL-C levels and remain at risk for cardiovascular disease. Given the proven efficacy of lipid-lowering drugs in the reduction of LDL-C, there is room for further optimization of treatments to maximize benefits and improve outcomes.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Kefaya Qaddoum ◽  
E. L. Hines ◽  
D. D. Iliescu

In the area of greenhouse operation, yield prediction still relies heavily on human expertise. This paper proposes an automatic tomato yield predictor to assist the human operators in anticipating more effectively weekly fluctuations and avoid problems of both overdemand and overproduction if the yield cannot be predicted accurately. The parameters used by the predictor consist of environmental variables inside the greenhouse, namely, temperature, CO2, vapour pressure deficit (VPD), and radiation, as well as past yield. Greenhouse environment data and crop records from a large scale commercial operation, Wight Salads Group (WSG) in the Isle of Wight, United Kingdom, collected during the period 2004 to 2008, were used to model tomato yield using an Intelligent System called “Evolving Fuzzy Neural Network” (EFuNN). Our results show that the EFuNN model predicted weekly fluctuations of the yield with an average accuracy of 90%. The contribution suggests that the multiple EFUNNs can be mapped to respective task-oriented rule-sets giving rise to adaptive knowledge bases that could assist growers in the control of tomato supplies and more generally could inform the decision making concerning overall crop management practices.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 146 ◽  
Author(s):  
Guanming Wu ◽  
Eric Dawson ◽  
Adrian Duong ◽  
Robin Haw ◽  
Lincoln Stein

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.


Sign in / Sign up

Export Citation Format

Share Document