query languages Latest Research Papers

Complex event recognition (CER) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real time. CER finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. Existing CER languages lack a clear semantics, however, which makes them hard to understand and generalize. Moreover, there are no general techniques for evaluating CER query languages with clear performance guarantees. In this article, we embark on the task of giving a rigorous and efficient framework to CER. We propose a formal language for specifying complex events, called complex event logic (CEL), that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. We give insight into the language design trade-offs regarding the strict sequencing operators of CEL and selection strategies. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by introducing a formal computational model for CER, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by output-linear delay enumeration of the results.

Download Full-text

Balancing Expressiveness and Inexpressiveness in View Design

ACM Transactions on Database Systems ◽

10.1145/3488370 ◽

2021 ◽

Vol 46 (4) ◽

pp. 1-40

Author(s):

Michael Benedikt ◽

Pierre Bourhis ◽

Louis Jachiet ◽

Efthymia Tsamoura

Keyword(s):

Information Systems ◽

Query Languages ◽

Data Sources ◽

Data Publishing ◽

Common Mechanism ◽

Distributed Data ◽

Conjunctive Queries ◽

Derived Data ◽

Minimal Information

We study the design of data publishing mechanisms that allow a collection of autonomous distributed data sources to collaborate to support queries. A common mechanism for data publishing is via views : functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual sources, but with the intention of supporting integrated queries. In deciding what data to expose to users, two considerations must be balanced. The views must be sufficiently expressive to support queries that users want to ask—the utility of the publishing mechanism. But there may also be some expressiveness restrictions. Here, we consider two restrictions, a minimal information requirement, saying that the views should reveal as little as possible while supporting the utility query, and a non-disclosure requirement, formalizing the need to prevent external users from computing information that data owners do not want revealed. We investigate the problem of designing views that satisfy both expressiveness and inexpressiveness requirements, for views in a restricted information systems - query languages (conjunctive queries), and for arbitrary views.

Download Full-text

Matrix Query Languages

ACM SIGMOD Record ◽

10.1145/3503780.3503782 ◽

2021 ◽

Vol 50 (3) ◽

pp. 6-19

Author(s):

Floris Geerts ◽

Thomas Muñoz ◽

Cristian Riveros ◽

Jan Van den Bussche ◽

Domagoj Vrgoč

Keyword(s):

Linear Algebra ◽

Data Analytics ◽

Query Language ◽

Query Languages ◽

Arithmetic Circuits ◽

Matrix Operations ◽

The Matrix

Due to the importance of linear algebra and matrix operations in data analytics, there has been a renewed interest in developing query languages that combine both standard relational operations and linear algebra operations. We survey aspects of the matrix query language MATLANG and extensions thereof, and connect matrix query languages to classical query languages and arithmetic circuits.

Download Full-text

The Complexity of Counting Problems Over Incomplete Databases

ACM Transactions on Computational Logic ◽

10.1145/3461642 ◽

2021 ◽

Vol 22 (4) ◽

pp. 1-52

Author(s):

Marcelo Arenas ◽

Pablo BarcelÓ ◽

Mikaël Monet

Keyword(s):

Polynomial Time ◽

Relational Databases ◽

Approximation Scheme ◽

Query Languages ◽

Complexity Classes ◽

Conjunctive Query ◽

Counting Problems ◽

Incomplete Databases ◽

Boolean Query ◽

The Impact

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q , we consider the following two problems: Given as input an incomplete database D , (a) return the number of completions of D that satisfy q ; or (b) return the number of valuations of the nulls of D yielding a completion that satisfies q . We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when q is a self-join–free conjunctive query and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables ); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations: For instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption. Moreover, we find that both (1) and (2) can reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial-time randomized approximation scheme (FPRAS), in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

Download Full-text

DESIGN AND EVALUATION OF A BIM-GIS INTEGRATED INFORMATION MODEL USING RDF GRAPH DATABASE

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-viii-4-w2-2021-175-2021 ◽

2021 ◽

Vol VIII-4/W2-2021 ◽

pp. 175-182

Author(s):

A.-H. Hor ◽

G. Sohn

Keyword(s):

Performance Metrics ◽

Graph Model ◽

Information Model ◽

Query Languages ◽

Semantic Integration ◽

Graph Database ◽

Graph Databases ◽

Rdf Graph ◽

Functional Components ◽

Integrated Information

Abstract. The semantic integration modeling of BIM industry foundations classes and GIS City-geographic markup language are a milestone for many applications that involve both domains of knowledge. In this paper, we propose a system design architecture, and implementation of Extraction, Transformation and Loading (ETL) workflows of BIM and GIS model into RDF graph database model, these workflows were created from functional components and ontological frameworks supporting RDF SPARQL and graph databases Cypher query languages. This paper is about full understanding of whether RDF graph database is suitable for a BIM-GIS integrated information model, and it looks deeper into the assessment of translation workflows and evaluating performance metrics of a BIM-GIS integrated data model managed in an RDF graph database, the process requires designing and developing various pipelines of workflows with semantic tools in order to get the data and its structure into an appropriate format and demonstrate the potential of using RDF graph databases to integrate, manage and analyze information and relationships from both GIS and BIM models, the study also has introduced the concepts of Graph-Model occupancy indexes of nodes, attributes and relationships to measure queries outputs and giving insights on data richness and performance of the resulting BIM-GIS semantically integrated model.

Download Full-text

Translating synthetic natural language to database queries with a polyglot deep learning framework

Scientific Reports ◽

10.1038/s41598-021-98019-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Adrián Bazaga ◽

Nupur Gunwant ◽

Gos Micklem

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Natural Language ◽

User Interfaces ◽

Query Languages ◽

Database Queries ◽

Learning Framework ◽

The Creation ◽

To Come ◽

Multiple Domains

AbstractThe number of databases as well as their size and complexity is increasing. This creates a barrier to use especially for non-experts, who have to come to grips with the nature of the data, the way it has been represented in the database, and the specific query languages or user interfaces by which data are accessed. These difficulties worsen in research settings, where it is common to work with many different databases. One approach to improving this situation is to allow users to pose their queries in natural language. In this work we describe a machine learning framework, Polyglotter, that in a general way supports the mapping of natural language searches to database queries. Importantly, it does not require the creation of manually annotated data for training and therefore can be applied easily to multiple domains. The framework is polyglot in the sense that it supports multiple different database engines that are accessed with a variety of query languages, including SQL and Cypher. Furthermore Polyglotter supports multi-class queries. Good performance is achieved on both toy and real databases, as well as a human-annotated WikiSQL query set. Thus Polyglotter may help database maintainers make their resources more accessible.

Download Full-text

60 Years of Databases

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2021.03.040 ◽

2021 ◽

pp. 040-071

Author(s):

V.A. Reznichenko ◽

Keyword(s):

Big Data ◽

Soviet Union ◽

Research And Development ◽

Relational Databases ◽

Rapid Development ◽

Conceptual Modeling ◽

Query Languages ◽

Active Object ◽

The Soviet Union ◽

Database Research

The article provides an overview of research and development of databases since their appearance in the 60s of the last century to the present time. The following stages are distinguished: the emergence formation and rapid development, the era of relational databases, extended relational databases, post-relational databases and big data. At the stage of formation, the systems IDS, IMS, Total and Adabas are described. At the stage of rapid development, issues of ANSI/X3/SPARC database architecture, CODASYL proposals, concepts and languages of conceptual modeling are highlighted. At the stage of the era of relational databases, the results of E. Codd's scientific activities, the theory of dependencies and normal forms, query languages, experimental research and development, optimization and standardization, and transaction management are revealed. The extended relational databases phase is devoted to describing temporal, spatial, deductive, active, object, distributed and statistical databases, array databases, and database machines and data warehouses. At the next stage, the problems of post-relational databases are disclosed, namely, NOSQL-, NewSQL- and ontological databases. The sixth stage is devoted to the disclosure of the causes of occurrence, characteristic properties, classification, principles of work, methods and technologies of big data. Finally, the last section provides a brief overview of database research and development in the Soviet Union.

Download Full-text

An adaptive spark-based framework for querying large-scale NoSQL and relational databases

PLoS ONE ◽

10.1371/journal.pone.0255562 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255562

Author(s):

Eman Khashan ◽

Ali Eldesouky ◽

Sally Elghamrawy

Keyword(s):

Big Data ◽

Data Storage ◽

Relational Databases ◽

Large Scale ◽

Query Languages ◽

Heterogeneous Data ◽

Query Execution ◽

Database Queries ◽

Nosql Databases ◽

Complex Queries

The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others’ efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios’ benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems.

Download Full-text

Keyword-Based Knowledge Graph Exploration Based on Quadratic Group Steiner Trees

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/215 ◽

2021 ◽

Author(s):

Yuxuan Shi ◽

Gong Cheng ◽

Trung-Kien Tran ◽

Jie Tang ◽

Evgeny Kharlamov

Keyword(s):

Steiner Tree ◽

Exact Algorithm ◽

Query Languages ◽

Steiner Tree Problem ◽

Underlying Structure ◽

Graph Exploration ◽

Group Steiner Tree ◽

The Cost ◽

Semantic Distances ◽

Quadratic Group

Exploring complex structured knowledge graphs (KGs) is challenging for non-experts as it requires knowledge of query languages and the underlying structure of the KGs. Keyword-based exploration is a convenient paradigm, and computing a group Steiner tree (GST) as an answer is a popular implementation. Recent studies suggested improving the cohesiveness of an answer where entities have small semantic distances from each other. However, how to efficiently compute such an answer is open. In this paper, to model cohesiveness in a generalized way, the quadratic group Steiner tree problem (QGSTP) is formulated where the cost function extends GST with quadratic terms representing semantic distances. For QGSTP we design a branch-and-bound best-first (B3F) algorithm where we exploit combinatorial methods to estimate lower bounds for costs. This exact algorithm shows practical performance on medium-sized KGs.

Download Full-text

TREE-BASED SEMANTIC ANALYSIS METHOD FOR NATURAL LANGUAGE PHRASE TO FORMAL QUERY CONVERSION

Radio Electronics Computer Science Control ◽

10.15588/1607-3274-2021-2-11 ◽

2021 ◽

pp. 105-113

Author(s):

A. A. Litvin ◽

V. Yu. Velychko ◽

V. V. Kaverynskyi

Keyword(s):

Natural Language ◽

Program Implementation ◽

Semantic Analysis ◽

Query Languages ◽

Performance Criteria ◽

Analysis Method ◽

Dialog System ◽

Natural Language Interface ◽

Minimum Number

Context. This work is devoted to the problem of natural language interface construction for ontological graph databases. The focus here is on the methods for the conversion of natural language phrases into formal queries in SPARQL and CYPHER query languages. Objective. The goals of the work are the creation of a semantic analysis method for the input natural language phrases semantic type determination and obtaining meaningful entities from them for query template variables initialization, construction of flexible query templates for the types, development of program implementation of the proposed technique. Method. A tree-based method was developed for semantic determination of a user’s phrase type and obtaining a set of terms from it to put them into certain places of the most suiting formal query template. The proposed technique solves the tasks of the phrase type determination (and this is the criterion of the formal query template selection) and obtaining meaningful terms, which are to initialize variables of the chosen template. In the current work only interrogative and incentive user’s phrases are considered i.e. ones that clearly propose the system to answer or to do something. It is assumed that the considered dialog or reference system uses a graph ontological database, which directly impacts the formal query patterns – the resulting queries are destined to be in SPARQL or Cypher query languages. The semantic analysis examples considered in this work are aimed primarily at inflective languages, especially, Ukrainian and Russian, but the basic principles could be suitable to most of the other languages. Results. The developed method of natural language phrase to a formal query in SPARQL and CYPHER conversion has been implemented in software for Ukrainian and Norwegian languages using narrow subjected ontologies and tested against formal performance criteria. Conclusions. The proposed method allows the dialog system fast and with minimum number of steps to select the most suitable query template and extract informative entities from a natural language phrase given the huge phrase variability in inflective languages. Carried out experiments have shown high precision and reliability of the constructed system and its potential for practical usage and further development.

Download Full-text

query languages
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Formal Framework for Complex Event Recognition

Balancing Expressiveness and Inexpressiveness in View Design

Matrix Query Languages

The Complexity of Counting Problems Over Incomplete Databases

DESIGN AND EVALUATION OF A BIM-GIS INTEGRATED INFORMATION MODEL USING RDF GRAPH DATABASE

Translating synthetic natural language to database queries with a polyglot deep learning framework

60 Years of Databases

An adaptive spark-based framework for querying large-scale NoSQL and relational databases

Keyword-Based Knowledge Graph Exploration Based on Quadratic Group Steiner Trees

TREE-BASED SEMANTIC ANALYSIS METHOD FOR NATURAL LANGUAGE PHRASE TO FORMAL QUERY CONVERSION

Export Citation Format

query languagesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Formal Framework for Complex Event Recognition

Balancing Expressiveness and Inexpressiveness in View Design

Matrix Query Languages

The Complexity of Counting Problems Over Incomplete Databases

DESIGN AND EVALUATION OF A BIM-GIS INTEGRATED INFORMATION MODEL USING RDF GRAPH DATABASE

Translating synthetic natural language to database queries with a polyglot deep learning framework

60 Years of Databases

An adaptive spark-based framework for querying large-scale NoSQL and relational databases

Keyword-Based Knowledge Graph Exploration Based on Quadratic Group Steiner Trees

TREE-BASED SEMANTIC ANALYSIS METHOD FOR NATURAL LANGUAGE PHRASE TO FORMAL QUERY CONVERSION

query languages
Recently Published Documents