relational algebra
Recently Published Documents

AbstractThe success of NoSQL DBMSs has pushed the adoption of polyglot storage systems that take advantage of the best characteristics of different technologies and data models. While operational applications take great benefit from this choice, analytical applications suffer the absence of schema consistency, not only between different DBMSs but within a single NoSQL system as well. In this context, the discipline of data science is steering analysts away from traditional data warehousing and toward a more flexible and lightweight approach to data analysis. The idea is to perform OLAP analyses in a pay-as-you-go manner across heterogeneous schemas and data models, where the integration is progressively carried out by the user as the available data is explored. In this paper, we propose an approach to support data analysis within a high-variety multistore, with heterogeneous schemas and overlapping records. Our approach supports relational, document, wide-column, and key-value data models by automatically handling both data model and schema heterogeneity through a dataspace layer on top of the underlying DBMSs. The expressiveness we enable corresponds to GPSJ queries, which are the most common class of queries in OLAP applications. We rely on nested relational algebra to define a cross-database execution plan. The system has been prototyped on Apache Spark.

Download Full-text

The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design

33rd International Conference on Scientific and Statistical Database Management ◽

10.1145/3468791.3472262 ◽

2021 ◽

Author(s):

Chris Jermaine

Keyword(s):

Machine Learning ◽

System Design ◽

Learning System ◽

Relational Algebra

Download Full-text

PATSQL

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476253 ◽

2021 ◽

Vol 14 (11) ◽

pp. 1937-1949

Author(s):

Keita Takenouchi ◽

Takashi Ishio ◽

Joji Okada ◽

Yuji Sakata

Keyword(s):

Data Analysis ◽

Projection Operator ◽

Computational Cost ◽

Relational Algebra ◽

Propagation Mechanism ◽

Prior Work ◽

Top Down ◽

Transformation Rules ◽

Input And Output ◽

Programming By Example

SQL is one of the most popular tools for data analysis, and it is now used by an increasing number of users without having expertise in databases. Several studies have proposed programming-by-example approaches to help such non-experts to write correct SQL queries. While existing methods support a variety of SQL features such as aggregation and nested query, they suffer a significant increase in computational cost as the scale of example tables increases. In this paper, we propose an efficient algorithm utilizing properties known in relational algebra to synthesize SQL queries from input and output tables. Our key insight is that a projection operator in a program sketch can be lifted above other operators by applying transformation rules in relational algebra, while preserving the semantics of the program. This enables a quick inference of appropriate columns in the projection operator, which is an essential component in synthesis but causes combinatorial explosions in prior work. We also introduce a novel form of constraints and its top-down propagation mechanism for efficient sketch completion. We implemented this algorithm in our tool PATSQL and evaluated it on 226 queries from prior benchmarks and Kaggle's tutorials. As a result, PATSQL solved 68% of the benchmarks and found 89% of the solutions within a second. Our tool is available at https://naist-se.github.io/patsql/.

Download Full-text

Application of semantic models and criteria equivalence of data to increase efficiency func-tioning of economic systems

Vehicle and Electronics Innovative Technologies ◽

10.30977/veit.2021.19.0.41 ◽

2021 ◽

pp. 41-46

Author(s):

Ganna Pliekhova ◽

Olena Alisejko ◽

Zoia Kochuieva

Keyword(s):

Modern Society ◽

Subject Area ◽

Relational Algebra ◽

Relational Model ◽

Practical Significance ◽

Model Parameters ◽

Mathematical Methods ◽

Semantic Equivalence ◽

Subject Areas ◽

Modern Information

Problem. In modern society, the role of modeling as a way of cognizing objects with complex structures is growing. The problem of development of models and criteria of semantic equivalence of data under the condition of their lexical ambiguity in relation to relational databases is considered. This is due to the impossibility or undesirability of conducting an experiment on real objects. Modeling was initially applied in "well" studied subject areas (for which the basic laws of object interaction were already known. This knowledge made it possible to set a priori the class of used models of the subject area and reduce the task to setting the model parameters according to the available experimental data. A fundamental change in the modeling scheme occurred during the transition to the development of modeling systems for "weakly" formalized subject areas, where the structure itself and the class of applicable models must be refined in the course of research. The widespread use of relational DB and their use in a wide variety of applications shows that the relational data model is sufficient for modeling domains. Results. The purpose of developing criteria is to prevent relational algebra operations on attributes with lexical and semantic ambiguity. Methods of developing methods and criteria are based on the use of mathematical methods and the use of modern information technology. The scientific novelty is to solve the problem of semantic comparability of relational relations attributes by means of relational model, which allows to effectively solve problems of prevention of relational algebra operations, which lead to data destruction due to ambiguity of lexical and semantic meanings of attribute names. The practical significance lies in the development of methods for organizing access to data in large subject areas, which together with the degree of efficiency of their processing serve as the foundation of the modern information industry and normalizes the vocabulary of subject area description and coordination of management tasks within a single approach.

Download Full-text

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217263 ◽

2021 ◽

pp. 304-309

Author(s):

Prasanna M. Rathod ◽

Prof. Dr. Anjali B. Raut

Keyword(s):

Data Mining ◽

Relational Algebra ◽

Data Migration ◽

Data Sets ◽

Data Set ◽

Application Performance ◽

Data Mining Algorithms ◽

Mining Project ◽

Pivot Methods ◽

Mining Algorithms

Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: ? CASE: Exploiting the programming CASE construct; ? SPJ: Based on standard relational algebra operators (SPJ queries); ? PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, whereas the SPJ method does not. For query optimization the distance computation and nearest cluster in the k-means are based on SQL. Workload balancing is the assignment of work to processors in a way that maximizes application performance. The process of load balancing can be generalized into four basic steps: 1. Monitoring processor load and state; 2. Exchanging workload and state information between processors; 3. Decision making; 4. Data migration. The decision phase is triggered when the load imbalance is detected to calculate optimal data redistribution. In the fourth and last phase, data migrates from overloaded processors to under-loaded ones.

Download Full-text

Tensor relational algebra for distributed machine learning system design

Proceedings of the VLDB Endowment ◽

10.14778/3457390.3457399 ◽

2021 ◽

Vol 14 (8) ◽

pp. 1338-1350

Author(s):

Binhang Yuan ◽

Dimitrije Jankov ◽

Jia Zou ◽

Yuxin Tang ◽

Daniel Bourgeois ◽

...

Keyword(s):

Machine Learning ◽

Empirical Study ◽

System Design ◽

High Efficiency ◽

Learning Systems ◽

Learning System ◽

Relational Algebra ◽

Activation Functions ◽

Distributed Environment ◽

Distributed Machine Learning

We consider the question: what is the abstraction that should be implemented by the computational engine of a machine learning system? Current machine learning systems typically push whole tensors through a series of compute kernels such as matrix multiplications or activation functions, where each kernel runs on an AI accelerator (ASIC) such as a GPU. This implementation abstraction provides little built-in support for ML systems to scale past a single machine, or for handling large models with matrices or tensors that do not easily fit into the RAM of an ASIC. In this paper, we present an alternative implementation abstraction called the tensor relational algebra (TRA). The TRA is a set-based algebra based on the relational algebra. Expressions in the TRA operate over binary tensor relations, where keys are multi-dimensional arrays and values are tensors. The TRA is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML workflows in distributed clusters.

Download Full-text

Development of the structure of information system for supporting the activity of nonprofit horticultural partnerships

Программные системы и вычислительные методы ◽

10.7256/2454-0714.2021.3.35834 ◽

2021 ◽

pp. 25-39

Author(s):

Ol'ga Viktorovna Sviridova ◽

Aleksandr Aleksandrovich Rybanov ◽

Evgeniya Mikhailovna Filippova

Keyword(s):

Management Accounting ◽

Balance Sheet ◽

Relational Algebra ◽

Automated System ◽

Information Support ◽

Economic Activities ◽

Suggested Approach ◽

Software Products ◽

The Subject ◽

Management Reporting

The developed automated system of information support for nonprofit horticultural partnerships (NHP) is intended for automation of management accounting of the economic activities of NHP. The effective management of the activity of nonprofit horticultural partnerships requires operating full, accurate, objective and timely economic information. This can be achieved through management accounting of the economic activity of NHP. The subject of this research is the methods of automation of control, monitoring and support of the establishment of management reporting of NHP. The object of this research is the information systems functioning within the “client-server” architecture. The research methods include the apparatus of relational algebra, theory of sets, optimization and mathematical statistics. It is noted that the activity of many NHP is carried out in the so-called “manual mode”, i.e. all necessary documents the employees fill by hand, and all calculations are by means of calculator. This substantiates the relevance of this research. Leaning on the conducted comparison of software products-analogues based the Saati method, the software “Info-Accountant for NHP” is chosen as a prototype. The author determines and describes the main algorithms of the developed system, the peculiarity of which is the formation of balance sheet and reports of its implementation, calculation of membership fees, introduction of the function of subsystem of NHP reference books(of the owners of land plots, streets, tariffs, expenditures, etc.). The output data is provided in form of a chart on the display form of the report subsystem. The scientific novelty lies in the suggested approach towards automation of accounting:  development of the forecast for pumping up the budget based on the previous periods.

Download Full-text

N-Tuple Algebra as a Generalized Theory of Relations

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch048 ◽

2021 ◽

pp. 685-700

Author(s):

Boris A. Kulik ◽

Alexander Y. Fridman

Keyword(s):

Artificial Intelligence ◽

Mathematical Model ◽

General Theory ◽

Relational Algebra ◽

Heterogeneous Information ◽

Mathematical Objects ◽

Knowledge Models ◽

Analysis Methods ◽

Modeling Uncertainties ◽

Universal Structure

In ITs, analysis of heterogeneous information often necessitates unification of presentation forms and processing procedures for such data. To solve this problem, one needs a universal structure, which allows reducing various data and knowledge models to a single mathematical model with unified analysis methods. Such a universal structure is the relation, which is mainly associated with relational algebra. However, relations can model as different, at first glance, mathematical objects as graphs, networks, artificial intelligence structures, predicates, logical formulas, etc. Representation and analysis of such structures and models requires for more expressive means and methods than relational algebra provides. So, with a view to developing a general theory of relations, the authors propose n-tuple algebra (NTA) that allows for formalizing a wide set of logical problems (deductive, abductive, and modified reasoning; modeling uncertainties; and so on). This paper considers matters of metrization and clustering for NTA objects with ordered domains of attributes.

Download Full-text

Model of indicator of linearization of user interfaces of automated workplaces of operational units of emergency services

Technology of technosphere safety ◽

10.25257/tts.2021.3.93.29-41 ◽

2021 ◽

Vol 93 ◽

pp. 29-41

Author(s):

S. V. Sokolov ◽

◽

A. N. Morozov ◽

Keyword(s):

User Interface ◽

User Interfaces ◽

Visual Information ◽

Emergency Services ◽

Human Perception ◽

Relational Algebra ◽

Interface Elements ◽

Emergency Rescue ◽

Pin Configuration

Introduction. The article presents a model of one of the possible indicators of the quality of user interfaces (PIN) of automated workplaces of operational units of emergency and emergency rescue services (EiASS), namely, the linearization indicator that takes into account the psychophysiological features of human perception of visual information. Goals and objectives. Reducing the response time of the operational units of the EiASS by reducing the time of their dispatching. Development of a mathematical model and an algorithm for calculating the linearization indicator of PIN elements (EPIN), which allows estimating the dispatch time depending on their relative location on the monitor. Methods. Methods of set theory and relational algebra were used to construct a PIN model and an algorithm for calculating the linearization index. To describe the PIN configuration, the concepts of an archipelago and a frame of interface elements are introduced. Results and discussion. The success of the EiASS actions largely depends on the dispatch time, during which the required number of units is determined and sent. Therefore, the time spent on solving dispatching tasks is one of the most common and objective indicators of the quality of the PIN. The best of the investigated automated workplaces will be the one with the specified time less. However, the time indicator gives an idea only about the relative time - the time of operation of one automated workplace relative to another. And it does not give any idea of the absolute time that would be spent on solving the problem by some abstract automated workplace with an optimal EPIN configuration. For this reason, an indicator that is sensitive to the PIN configuration has been developed. The indicator gives an answer to the question of why one automated workplace is better than another, and can be used to optimize the layout of the PIN. Conclusions. Based on the proposed model, an algorithm for calculating the numerical values of the linearization indicator of user interface elements sensitive to their size and relative position on the monitor of the EiASS operator is developed. This allows you to optimize the user interface according to the criterion of time for solving tasks and, accordingly, reduce the dispatching time of the operational units of the EiASS. Keywords: operational units, user interface, quality, linearization indicator, navigation, sets, archipelago, frame

Download Full-text

relational algebraRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Unleashing the power of querying streaming data in a temporal database world: A relational algebra approach

A dataspace-based framework for OLAP analyses in a high-variety multistore

The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design

PATSQL

Application of semantic models and criteria equivalence of data to increase efficiency func-tioning of economic systems

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis

Tensor relational algebra for distributed machine learning system design

Development of the structure of information system for supporting the activity of nonprofit horticultural partnerships

N-Tuple Algebra as a Generalized Theory of Relations

Model of indicator of linearization of user interfaces of automated workplaces of operational units of emergency services

relational algebra
Recently Published Documents