In-Memory Interval Joins

The VLDB Journal ◽

10.1007/s00778-020-00639-0 ◽

2021 ◽

Author(s):

Panagiotis Bouros ◽

Nikos Mamoulis ◽

Dimitrios Tsitsigkos ◽

Manolis Terrovitis

Keyword(s):

Parallel Computation ◽

State Of The Art ◽

Complex Data ◽

Plane Sweep ◽

Join Algorithm ◽

Sweep Algorithm ◽

Join Algorithms ◽

Domain Partitioning ◽

Complex Data Structure ◽

Independent Tasks

AbstractThe interval join is a popular operation in temporal, spatial, and uncertain databases. The majority of interval join algorithms assume that input data reside on disk and so, their focus is to minimize the I/O accesses. Recently, an in-memory approach based on plane sweep (PS) for modern hardware was proposed which greatly outperforms previous work. However, this approach relies on a complex data structure and its parallelization has not been adequately studied. In this article, we investigate in-memory interval joins in two directions. First, we explore the applicability of a largely ignored forward scan (FS)-based plane sweep algorithm, for single-threaded join evaluation. We propose four optimizations for FS that greatly reduce its cost, making it competitive or even faster than the state-of-the-art. Second, we study in depth the parallel computation of interval joins. We design a non-partitioning-based approach that determines independent tasks of the join algorithm to run in parallel. Then, we address the drawbacks of the previously proposed hash-based partitioning and suggest a domain-based partitioning approach that does not produce duplicate results. Within our approach, we propose a novel breakdown of the partition-joins into mini-joins to be scheduled in the available CPU threads and propose an adaptive domain partitioning, aiming at load balancing. We also investigate how the partitioning phase can benefit from modern parallel hardware. Our thorough experimental analysis demonstrates the advantage of our novel partitioning-based approach for parallel computation.

Download Full-text

Parallel GPU-based Plane-Sweep Algorithm for Construction of iCPI-Trees

Journal of Database Management ◽

10.4018/jdm.2015070101 ◽

2015 ◽

Vol 26 (3) ◽

pp. 1-20 ◽

Cited By ~ 7

Author(s):

Witold Andrzejewski ◽

Pawel Boinski

Keyword(s):

Spatial Databases ◽

Pattern Mining ◽

State Of The Art ◽

Parallel Implementation ◽

Conference Paper ◽

Location Pattern ◽

Plane Sweep ◽

Tree Construction ◽

Sweep Algorithm ◽

Efficient Construction

This article tackles the problem of efficient construction of iCPI trees, frequently used in co-location pattern discovery in spatial databases. It discusses the methods for parallelization of iCPI-tree construction and plane-sweep algorithms used in state-of-the-art algorithms for co-location pattern mining. The main contribution of this paper is threefold: (1) a general algorithm for parallel iCPI-tree construction is presented, (2) two variants of parallel plane-sweep algorithm (which can be used in conjunction with the aforementioned iCPI-tree construction algorithm) are introduced and (3) all three algorithms are implemented on CUDA GPU platform and their performance is tested against an efficient multithreaded parallel implementation of iCPI-tree construction on CPU. Experiments prove that our solutions allow for large speedups over CPU version of the algorithm. This paper is an extension of the conference paper (Andrzejewski & Boinski, 2014).

Download Full-text

Cache-efficient sweeping-based interval joins for extended Allen relation predicates

The VLDB Journal ◽

10.1007/s00778-020-00650-5 ◽

2021 ◽

Author(s):

Danila Piatov ◽

Sven Helmer ◽

Anton Dignös ◽

Fabio Persia

Keyword(s):

Data Structure ◽

Experimental Evaluation ◽

State Of The Art ◽

Temporal Databases ◽

Access Method ◽

Wide Range ◽

Interval Relation ◽

Cache Efficient ◽

Join Algorithms ◽

Better Than

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.

Download Full-text

Plane Sweep Algorithm

Encyclopedia of GIS ◽

10.1007/978-3-319-17885-1_989 ◽

2017 ◽

pp. 1594-1597

Author(s):

Jordan Wood ◽

Sangho Kim

Keyword(s):

Plane Sweep ◽

Sweep Algorithm

Download Full-text

Plane Sweep Algorithm

10.1007/springerreference_62645 ◽

2011 ◽

Keyword(s):

Plane Sweep ◽

Sweep Algorithm

Download Full-text

Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling

Computers ◽

10.3390/computers9020037 ◽

2020 ◽

Vol 9 (2) ◽

pp. 37 ◽

Cited By ~ 1

Author(s):

Luca Cappelletti ◽

Tommaso Fontana ◽

Guido Walter Di Donato ◽

Lorenzo Di Tucci ◽

Elena Casiraghi ◽

...

Keyword(s):

Deep Learning ◽

Missing Data ◽

State Of The Art ◽

The State ◽

Complex Data ◽

Data Imputation ◽

Genome Sequences ◽

Missing Data Imputation ◽

The Past ◽

Learning Techniques

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.

Download Full-text

A plane-sweep algorithm for finding a closest pair among convex planar objects

STACS 92 - Lecture Notes in Computer Science ◽

10.1007/3-540-55210-3_186 ◽

1992 ◽

pp. 219-232 ◽

Cited By ~ 3

Author(s):

Frank Bartling ◽

Klaus Hinrichs

Keyword(s):

Plane Sweep ◽

Closest Pair ◽

Sweep Algorithm

Download Full-text

Retrieval-Oriented Design of Clinical Research Forms

Methods of Information in Medicine ◽

10.1055/s-0038-1634162 ◽

2001 ◽

Vol 40 (03) ◽

pp. 253-258 ◽

Cited By ~ 1

Author(s):

E. Eigenbauer ◽

S. Rasoul-Rockenschaub ◽

W. Gall

Keyword(s):

Clinical Research ◽

Data Linkage ◽

State Of The Art ◽

Data Retrieval ◽

Complex Data ◽

Clinical Documentation ◽

Clinical Forms ◽

User Friendly ◽

Intended Use

Abstract:Computerized clinical forms are subject to a wide variety of different requirements. They have to allow detailed documentation and must be user-friendly. State-of-the-art applications for design permit clinicians themselves to create their own forms as needed, with the various variables presented in different ways depending on their intended use. Often, however, only aspects of clinical documentation are considered, with no thought being given to subsequent data retrieval. This article presents guidelines for the retrieval-oriented design of clinical forms. It discusses where anticipatory measures for structuring forms are easier to accomplish than complex data linkage at the time of retrieval and analysis.

Download Full-text

New approach to the statistical analysis of cardiovascular data

Journal of Applied Physiology ◽

10.1152/japplphysiol.00772.2004 ◽

2005 ◽

Vol 98 (6) ◽

pp. 2298-2303 ◽

Cited By ~ 3

Author(s):

Michele R. Norton ◽

Richard P. Sloan ◽

Emilia Bagiella

Keyword(s):

Blood Pressure ◽

Data Structure ◽

Repeated Measures ◽

Blood Pressure Variability ◽

Statistical Tests ◽

Frequency Interval ◽

Complex Data ◽

Single Measure ◽

New Approach ◽

Complex Data Structure

Fourier-based approaches to analysis of variability of R-R intervals or blood pressure typically compute power in a given frequency band (e.g., 0.01–0.07 Hz) by aggregating the power at each constituent frequency within that band. This paper describes a new approach to the analysis of these data. We propose to partition the blood pressure variability spectrum into more narrow components by computing power in 0.01-Hz-wide bands. Therefore, instead of a single measure of variability in a specific frequency interval, we obtain several measurements. The approach generates a more complex data structure that requires a careful account of the nested repeated measures. We briefly describe a statistical methodology based on generalized estimating equations that suitably handles this more complex data structure. To illustrate the methods, we consider systolic blood pressure data collected during psychological and orthostatic challenge. We compare the results with those obtained using the conventional methods to compute blood pressure variability, and we show that our approach yields more efficient results and more powerful statistical tests. We conclude that this approach may allow a more thorough analysis of cardiovascular parameters that are measured under different experimental conditions, such as blood pressure or heart rate variability.

Download Full-text

Efficient Communication Strategy in Parallel Computation Based on Domain Partitioning

JOURNAL OF CHEMICAL ENGINEERING OF JAPAN ◽

10.1252/jcej.17we155 ◽

2018 ◽

Vol 51 (1) ◽

pp. 79-82

Author(s):

Yohsuke Matsushita ◽

Tomoyuki Katayama ◽

Tatsuya Soma ◽

Shota Akaotsu ◽

Yasuhiro Saito ◽

...

Keyword(s):

Parallel Computation ◽

Communication Strategy ◽

Efficient Communication ◽

Domain Partitioning

Download Full-text

Machine Learning and Urban Drainage Systems: State-of-the-Art Review

Water ◽

10.3390/w13243545 ◽

2021 ◽

Vol 13 (24) ◽

pp. 3545

Author(s):

Soon-Ho Kwon ◽

Joong-Hoon Kim

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Urban Drainage ◽

Complex Data ◽

New Paradigm ◽

Neural Network Models ◽

Drainage Systems ◽

Time Operation ◽

Urban Drainage Systems ◽

Scientific Engineering

In the last decade, machine learning (ML) technology has been transforming daily lives, industries, and various scientific/engineering disciplines. In particular, ML technology has resulted in significant progress in neural network models; these enable the automatic computation of problem-relevant features and rapid capture of highly complex data distributions. We believe that ML approaches can address several significant new and/or old challenges in urban drainage systems (UDSs). This review paper provides a state-of-the-art review of ML-based UDS modeling/application based on three categories: (1) operation (real-time operation control), (2) management (flood-inundation prediction) and (3) maintenance (pipe defect detection). The review reveals that ML is utilized extensively in UDSs to advance model performance and efficiency, extract complex data distribution patterns, and obtain scientific/engineering insights. Additionally, some potential issues and future directions are recommended for three research topics defined in this study to extend UDS modeling/applications based on ML technology. Furthermore, it is suggested that ML technology can promote developments in UDSs. The new paradigm of ML-based UDS modeling/applications summarized here is in its early stages and should be considered in future studies.

Download Full-text