scholarly journals In-Memory Interval Joins

2021 ◽  
Author(s):  
Panagiotis Bouros ◽  
Nikos Mamoulis ◽  
Dimitrios Tsitsigkos ◽  
Manolis Terrovitis

AbstractThe interval join is a popular operation in temporal, spatial, and uncertain databases. The majority of interval join algorithms assume that input data reside on disk and so, their focus is to minimize the I/O accesses. Recently, an in-memory approach based on plane sweep (PS) for modern hardware was proposed which greatly outperforms previous work. However, this approach relies on a complex data structure and its parallelization has not been adequately studied. In this article, we investigate in-memory interval joins in two directions. First, we explore the applicability of a largely ignored forward scan (FS)-based plane sweep algorithm, for single-threaded join evaluation. We propose four optimizations for FS that greatly reduce its cost, making it competitive or even faster than the state-of-the-art. Second, we study in depth the parallel computation of interval joins. We design a non-partitioning-based approach that determines independent tasks of the join algorithm to run in parallel. Then, we address the drawbacks of the previously proposed hash-based partitioning and suggest a domain-based partitioning approach that does not produce duplicate results. Within our approach, we propose a novel breakdown of the partition-joins into mini-joins to be scheduled in the available CPU threads and propose an adaptive domain partitioning, aiming at load balancing. We also investigate how the partitioning phase can benefit from modern parallel hardware. Our thorough experimental analysis demonstrates the advantage of our novel partitioning-based approach for parallel computation.

2015 ◽  
Vol 26 (3) ◽  
pp. 1-20 ◽  
Author(s):  
Witold Andrzejewski ◽  
Pawel Boinski

This article tackles the problem of efficient construction of iCPI trees, frequently used in co-location pattern discovery in spatial databases. It discusses the methods for parallelization of iCPI-tree construction and plane-sweep algorithms used in state-of-the-art algorithms for co-location pattern mining. The main contribution of this paper is threefold: (1) a general algorithm for parallel iCPI-tree construction is presented, (2) two variants of parallel plane-sweep algorithm (which can be used in conjunction with the aforementioned iCPI-tree construction algorithm) are introduced and (3) all three algorithms are implemented on CUDA GPU platform and their performance is tested against an efficient multithreaded parallel implementation of iCPI-tree construction on CPU. Experiments prove that our solutions allow for large speedups over CPU version of the algorithm. This paper is an extension of the conference paper (Andrzejewski & Boinski, 2014).


2021 ◽  
Author(s):  
Danila Piatov ◽  
Sven Helmer ◽  
Anton Dignös ◽  
Fabio Persia

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.


2017 ◽  
pp. 1594-1597
Author(s):  
Jordan Wood ◽  
Sangho Kim
Keyword(s):  

Computers ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 37 ◽  
Author(s):  
Luca Cappelletti ◽  
Tommaso Fontana ◽  
Guido Walter Di Donato ◽  
Lorenzo Di Tucci ◽  
Elena Casiraghi ◽  
...  

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.


2001 ◽  
Vol 40 (03) ◽  
pp. 253-258 ◽  
Author(s):  
E. Eigenbauer ◽  
S. Rasoul-Rockenschaub ◽  
W. Gall

Abstract:Computerized clinical forms are subject to a wide variety of different requirements. They have to allow detailed documentation and must be user-friendly. State-of-the-art applications for design permit clinicians themselves to create their own forms as needed, with the various variables presented in different ways depending on their intended use. Often, however, only aspects of clinical documentation are considered, with no thought being given to subsequent data retrieval. This article presents guidelines for the retrieval-oriented design of clinical forms. It discusses where anticipatory measures for structuring forms are easier to accomplish than complex data linkage at the time of retrieval and analysis.


2005 ◽  
Vol 98 (6) ◽  
pp. 2298-2303 ◽  
Author(s):  
Michele R. Norton ◽  
Richard P. Sloan ◽  
Emilia Bagiella

Fourier-based approaches to analysis of variability of R-R intervals or blood pressure typically compute power in a given frequency band (e.g., 0.01–0.07 Hz) by aggregating the power at each constituent frequency within that band. This paper describes a new approach to the analysis of these data. We propose to partition the blood pressure variability spectrum into more narrow components by computing power in 0.01-Hz-wide bands. Therefore, instead of a single measure of variability in a specific frequency interval, we obtain several measurements. The approach generates a more complex data structure that requires a careful account of the nested repeated measures. We briefly describe a statistical methodology based on generalized estimating equations that suitably handles this more complex data structure. To illustrate the methods, we consider systolic blood pressure data collected during psychological and orthostatic challenge. We compare the results with those obtained using the conventional methods to compute blood pressure variability, and we show that our approach yields more efficient results and more powerful statistical tests. We conclude that this approach may allow a more thorough analysis of cardiovascular parameters that are measured under different experimental conditions, such as blood pressure or heart rate variability.


2018 ◽  
Vol 51 (1) ◽  
pp. 79-82
Author(s):  
Yohsuke Matsushita ◽  
Tomoyuki Katayama ◽  
Tatsuya Soma ◽  
Shota Akaotsu ◽  
Yasuhiro Saito ◽  
...  

Water ◽  
2021 ◽  
Vol 13 (24) ◽  
pp. 3545
Author(s):  
Soon-Ho Kwon ◽  
Joong-Hoon Kim

In the last decade, machine learning (ML) technology has been transforming daily lives, industries, and various scientific/engineering disciplines. In particular, ML technology has resulted in significant progress in neural network models; these enable the automatic computation of problem-relevant features and rapid capture of highly complex data distributions. We believe that ML approaches can address several significant new and/or old challenges in urban drainage systems (UDSs). This review paper provides a state-of-the-art review of ML-based UDS modeling/application based on three categories: (1) operation (real-time operation control), (2) management (flood-inundation prediction) and (3) maintenance (pipe defect detection). The review reveals that ML is utilized extensively in UDSs to advance model performance and efficiency, extract complex data distribution patterns, and obtain scientific/engineering insights. Additionally, some potential issues and future directions are recommended for three research topics defined in this study to extend UDS modeling/applications based on ML technology. Furthermore, it is suggested that ML technology can promote developments in UDSs. The new paradigm of ML-based UDS modeling/applications summarized here is in its early stages and should be considered in future studies.


Sign in / Sign up

Export Citation Format

Share Document