Data Dependencies
Recently Published Documents


TOTAL DOCUMENTS

243
(FIVE YEARS 74)

H-INDEX

22
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Andrey Chusov
Keyword(s):  

The paper presents algorithms for parallel and vectorized full-word addition of big unsigned integers with carry propagation. Because of the propagation, software parallelization and vectorization of non-polynomial addition of big integers have long been considered impractical due to data dependencies between digits of the operands. The presented algorithms are based upon parallel and vectorized detection of carry origins within elements of vector operands, masking bits which correspond to those elements and subsequent scalar addition of the resulting integers. The acquired bits can consequently be taken into account to adjust the sum using the Kogge-Stone method.<br>Essentially, the paper formalizes and experimentally verifies parallel and vectorized implementation of carry-lookahead adders applied at arbitrary granularity of data. This approach is noticeably beneficial for manycore, CUDA and vectorized implementation using AVX-512 with masked instructions.


2022 ◽  
Author(s):  
Andrey Chusov
Keyword(s):  

The paper presents algorithms for parallel and vectorized full-word addition of big unsigned integers with carry propagation. Because of the propagation, software parallelization and vectorization of non-polynomial addition of big integers have long been considered impractical due to data dependencies between digits of the operands. The presented algorithms are based upon parallel and vectorized detection of carry origins within elements of vector operands, masking bits which correspond to those elements and subsequent scalar addition of the resulting integers. The acquired bits can consequently be taken into account to adjust the sum using the Kogge-Stone method.<br>Essentially, the paper formalizes and experimentally verifies parallel and vectorized implementation of carry-lookahead adders applied at arbitrary granularity of data. This approach is noticeably beneficial for manycore, CUDA and vectorized implementation using AVX-512 with masked instructions.


2022 ◽  
Vol Volume 18, Issue 1 ◽  
Author(s):  
Batya Kenig ◽  
Dan Suciu

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.


2021 ◽  
Author(s):  
◽  
Van Tran Bao Le

<p>A database is said to be C-Armstrong for a finite set Σ of data dependencies in a class C if the database satisfies all data dependencies in Σ and violates all data dependencies in C that are not implied by Σ. Therefore, Armstrong databases are concise, user-friendly representations of abstract data dependencies that can be used to judge, justify, convey, and test the understanding of database design choices. Indeed, an Armstrong database satisfies exactly those data dependencies that are considered meaningful by the current design choice Σ. Structural and computational properties of Armstrong databases have been deeply investigated in Codd’s Turing Award winning relational model of data. Armstrong databases have been incorporated in approaches towards relational database design. They have also been found useful for the elicitation of requirements, the semantic sampling of existing databases, and the specification of schema mappings. This research establishes a toolbox of Armstrong databases for SQL data. This is challenging as SQL data can contain null marker occurrences in columns declared NULL, and may contain duplicate rows. Thus, the existing theory of Armstrong databases only applies to idealized instances of SQL data, that is, instances without null marker occurrences and without duplicate rows. For the thesis, two popular interpretations of null markers are considered: the no information interpretation used in SQL, and the exists but unknown interpretation by Codd. Furthermore, the study is limited to the popular class C of functional dependencies. However, the presence of duplicate rows means that the class of uniqueness constraints is no longer subsumed by the class of functional dependencies, in contrast to the relational model of data. As a first contribution a provably-correct algorithm is developed that computes Armstrong databases for an arbitrarily given finite set of uniqueness constraints and functional dependencies. This contribution is based on axiomatic, algorithmic and logical characterizations of the associated implication problem that are also established in this thesis. While the problem to decide whether a given database is Armstrong for a given set of such constraints is precisely exponential, our algorithm computes an Armstrong database with a number of rows that is at most quadratic in the number of rows of a minimum-sized Armstrong database. As a second contribution the algorithms are implemented in the form of a design tool. Users of the tool can therefore inspect Armstrong databases to analyze their current design choice Σ. Intuitively, Armstrong databases are useful for the acquisition of semantically meaningful constraints, if the users can recognize the actual meaningfulness of constraints that they incorrectly perceived as meaningless before the inspection of an Armstrong database. As a final contribution, measures are introduced that formalize the term “useful” and it is shown by some detailed experiments that Armstrong tables, as computed by the tool, are indeed useful. In summary, this research establishes a toolbox of Armstrong databases that can be applied by database designers to concisely visualize constraints on SQL data. Such support can lead to database designs that guarantee efficient data management in practice.</p>


2021 ◽  
Author(s):  
Anna Arestova ◽  
Wojciech Baron

The rapid development in information and communication technology confronts designers of real-time systems with new challenges that have arisen due to the increasing amount of data and an intensified interconnection of functions. This is e.g. driven by recent trends such as automated driving in the automotive field and digitization in factory automation. For distributed safety-critical systems, this progression has the impact that the complexity of scheduling tasks with precedence constraints organized in so-called task chains increases the more data has to be exchanged between tasks and the more functions are involved. Especially when data has to be transmitted over an Ethernet-based communication network, the coordination between the processing tasks running on different end-devices and the communication network has to be ensured to meet strict end-to-end deadlines of task chains. In this work, we present a heuristic approach that computes schedules for distributed and data-dependent task chains consisting of preemptive and periodic tasks, taking into account the network communication delays of time-sensitive networks. Our algorithm is able to solve large problems for synthetic network topologies with randomized data dependencies in a few seconds. A high success rate was achieved, which can also be further enhanced by relaxing the deadline conditions.


2021 ◽  
Author(s):  
Anna Arestova ◽  
Wojciech Baron

The rapid development in information and communication technology confronts designers of real-time systems with new challenges that have arisen due to the increasing amount of data and an intensified interconnection of functions. This is e.g. driven by recent trends such as automated driving in the automotive field and digitization in factory automation. For distributed safety-critical systems, this progression has the impact that the complexity of scheduling tasks with precedence constraints organized in so-called task chains increases the more data has to be exchanged between tasks and the more functions are involved. Especially when data has to be transmitted over an Ethernet-based communication network, the coordination between the processing tasks running on different end-devices and the communication network has to be ensured to meet strict end-to-end deadlines of task chains. In this work, we present a heuristic approach that computes schedules for distributed and data-dependent task chains consisting of preemptive and periodic tasks, taking into account the network communication delays of time-sensitive networks. Our algorithm is able to solve large problems for synthetic network topologies with randomized data dependencies in a few seconds. A high success rate was achieved, which can also be further enhanced by relaxing the deadline conditions.


Energies ◽  
2021 ◽  
Vol 14 (21) ◽  
pp. 7083
Author(s):  
Olga Gaidukova ◽  
Pavel Strizhak

A model was developed to research the critical conditions and time characteristics of the ignition of gel fuels in the course of conductive, convective, radiant and mixed heat transfer. MATLAB was used for numerical modeling. Original MATLAB code was established pursuant to the developed mathematical model. For gel fuel ignition at initial temperatures corresponding to cryogenic storage conditions with different heating schemes, a numerical analysis of interconnected processes of heat and mass transfer in the chemical reaction conditions and exothermic and endothermic phase transitions was conducted. The model was tested by comparing the theoretical results with the experimental data. Dependencies were established between the key process characteristic (i.e., the ignition delay time) and the ambient temperature when the following parameters were varied: emissivity, heat emission coefficient, activation energy and pre-exponential factor of the fuel vapor oxidation reaction. The critical values of the main parameters of the energy source were determined. For these values, gel fuel ignition conditions were consistently realized for each heating scheme. The critical heat fluxes necessary and sufficient for the ignition of typical gel fuels were determined.


2021 ◽  
Vol 15 (1) ◽  
pp. 72-84
Author(s):  
Jiayi Wang ◽  
Chengliang Chai ◽  
Jiabin Liu ◽  
Guoliang Li

Cardinality estimation is one of the most important problems in query optimization. Recently, machine learning based techniques have been proposed to effectively estimate cardinality, which can be broadly classified into query-driven and data-driven approaches. Query-driven approaches learn a regression model from a query to its cardinality; while data-driven approaches learn a distribution of tuples, select some samples that satisfy a SQL query, and use the data distributions of these selected tuples to estimate the cardinality of the SQL query. As query-driven methods rely on training queries, the estimation quality is not reliable when there are no high-quality training queries; while data-driven methods have no such limitation and have high adaptivity. In this work, we focus on data-driven methods. A good data-driven model should achieve three optimization goals. First, the model needs to capture data dependencies between columns and support large domain sizes (achieving high accuracy). Second, the model should achieve high inference efficiency, because many data samples are needed to estimate the cardinality (achieving low inference latency). Third, the model should not be too large (achieving a small model size). However, existing data-driven methods cannot simultaneously optimize the three goals. To address the limitations, we propose a novel cardinality estimator FACE, which leverages the Normalizing Flow based model to learn a continuous joint distribution for relational data. FACE can transform a complex distribution over continuous random variables into a simple distribution (e.g., multivariate normal distribution), and use the probability density to estimate the cardinality. First, we design a dequantization method to make data more "continuous". Second, we propose encoding and indexing techniques to handle Like predicates for string data. Third, we propose a Monte Carlo method to efficiently estimate the cardinality. Experimental results show that our method significantly outperforms existing approaches in terms of estimation accuracy while keeping similar latency and model size.


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2050
Author(s):  
Włodzimierz Bielecki ◽  
Piotr Błaszyński

In this article, we present a technique that allows us to generate parallel tiled code to calculate general linear recursion equations (GLRE). That code deals with multidimensional data and it is computing-intensive. We demonstrate that data dependencies available in an original code computing GLREs do not allow us to generate any parallel code because there is only one solution to the time partition constraints built for that program. We show how to transform the original code to another one that exposes dependencies such that there are two linear distinct solutions to the time partition restrictions derived from these dependencies. This allows us to generate parallel 2D tiled code computing GLREs. The wavefront technique is used to achieve parallelism, and the generated code conforms to the OpenMP C/C++ standard. The experiments that we conducted with the resulting parallel 2D tiled code show that this code is much more efficient than the original serial code computing GLREs. Code performance improvement is achieved by allowing parallelism and better locality of the target code.


Sign in / Sign up

Export Citation Format

Share Document