Query co-planning for shared execution in key-value stores

Josué Ttito; Renato Marroquín; Sergio Lifschitz; Lewis McGibbney; José Talavera

doi:10.5753/jidm.2021.1946

Query co-planning for shared execution in key-value stores

Journal of Information and Data Management ◽

10.5753/jidm.2021.1946 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Josué Ttito ◽

Renato Marroquín ◽

Sergio Lifschitz ◽

Lewis McGibbney ◽

José Talavera

Keyword(s):

Data Structure ◽

Data Structures ◽

Data Model ◽

Range Queries ◽

Model Data ◽

Data Movement ◽

Segment Tree ◽

Interval Tree ◽

Simple Interface ◽

Arbitrary Objects

Key-value stores propose a straightforward yet powerful data model. Data is modeled using key-value pairs where values can be arbitrary objects and written/read using the key associated with it. In addition to their simple interface, such data stores also provide read operations such as full and range scans. However, due to the simplicity of its interface, trying to optimize data accesses becomes challenging. This work aims to enable the shared execution of concurrent range and point queries on key-value stores. Thus, reducing the overall data movement when executing a complete workload. To accomplish this, we analyze different possible data structures and propose our variation of a segment tree, Updatable Interval Tree. Our data structure helps us co-planning and co-executing multiple range queries together and reduces redundant work. This results in executing workloads more efficiently and overall increased throughput, as we show in our evaluation.

Download Full-text

Query co-planning for shared execution in Key-Value Stores

10.5753/sbbd.2020.13643 ◽

2020 ◽

Author(s):

Josue Joel Ttito ◽

Renato Marroquin ◽

Sergio Lifschitz

Keyword(s):

Data Structure ◽

Data Structures ◽

Data Model ◽

Range Queries ◽

Model Data ◽

Data Movement ◽

Segment Tree ◽

Interval Tree ◽

Simple Interface ◽

Arbitrary Objects

Key-value stores propose a very simple yet powerful data model. Data is modeled using key-value pairs where values can be arbitrary objects and can be written/read using the key associated with it. In addition to their simple interface, such data stores also provide read operations such as full and range scans. However, due to the simplicity of its interface, trying to optimize data accesses becomes challenging. This work aims to enable the shared execution of concurrent range and point queries on key-value stores. Thus, reducing the overall data movement when executing a complete workload. To accomplish this, we analyze different possible data structures and propose our variation of a segment tree, Updatable Interval Tree. This data structure helps us co-planning and co-executing multiple range queries together, as we show in our evaluation.

Download Full-text

A Data Structure on Interval Graphs and Its Applications

Journal of Circuits System and Computers ◽

10.1142/s0218126697000127 ◽

1997 ◽

Vol 07 (03) ◽

pp. 165-175 ◽

Cited By ~ 10

Author(s):

Madhumangal Pal ◽

G. P. Bhattacharjee

Keyword(s):

Data Structure ◽

Data Structures ◽

Interval Graph ◽

Interval Graphs ◽

Point Of View ◽

Interval Tree

In this papar, a new data structure, interval tree (IT), is introduced for an interval graph. Some important properties of IT are studies from the algorithmic point of view. It has many advantages compared to the data structures which are commonly used to solve the problems on interval graphs. Using the properties of IT, the following problems are solved on interval graphs: (i) shortest distances between any two vertices, and (ii) the diameter of the graph.

Download Full-text

GENERALIZATION TECHNIQUE FOR 2D+SCALE DHE DATA MODEL

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w1-61-2016 ◽

2016 ◽

Vol XLII-2/W1 ◽

pp. 61-67

Author(s):

Hairi Karim ◽

Alias Abdul Rahman ◽

Pawel Boguslawski

Keyword(s):

Data Structure ◽

Data Structures ◽

Data Model ◽

Data Models ◽

Computer Application ◽

Scale Model ◽

Scale Dimension ◽

Local Modification ◽

Support Variety ◽

Or Applications

Different users or applications need different scale model especially in computer application such as game visualization and GIS modelling. Some issues has been raised on fulfilling GIS requirement of retaining the details while minimizing the redundancy of the scale datasets. Previous researchers suggested and attempted to add another dimension such as scale or/and time into a 3D model, but the implementation of scale dimension faces some problems due to the limitations and availability of data structures and data models. Nowadays, various data structures and data models have been proposed to support variety of applications and dimensionality but lack research works has been conducted in terms of supporting scale dimension. Generally, the Dual Half Edge (DHE) data structure was designed to work with any perfect 3D spatial object such as buildings. In this paper, we attempt to expand the capability of the DHE data structure toward integration with scale dimension. The description of the concept and implementation of generating 3D-scale (2D spatial + scale dimension) for the DHE data structure forms the major discussion of this paper. We strongly believed some advantages such as local modification and topological element (navigation, query and semantic information) in scale dimension could be used for the future 3D-scale applications.

Download Full-text

External-Storage Data Structures for Plane-Sweep Algorithms

BRICS Report Series ◽

10.7146/brics.v1i16.21651 ◽

1994 ◽

Vol 1 (16) ◽

Cited By ~ 2

Author(s):

Lars Arge

Keyword(s):

Data Structure ◽

Data Structures ◽

Priority Queue ◽

Optimal Number ◽

Range Searching ◽

Dynamic Data ◽

Plane Sweep ◽

Internal Memory ◽

Segment Tree ◽

Range Tree

In this paper we develop a technique for transforming an internal memory datastructure into an external storage data structure suitable for plane-sweep algorithms. We use this technique to develop external storage versions of the range tree and the segment tree. We also obtain an external priority queue. Using the first two structures, we solve the orthogonal segment intersection, the isothetic rectangle intersection, and the batched range searching problem in the optimal number of I/O-operations. Unlike previously known I/O-algorithms the developed algorithms are straightforward generalizations of the ordinary internal memory plane-sweep algorithms. Previously almost no dynamic data structures were known for the model we are working in.

Download Full-text

A Comparative Study about Data Structures Used for Efficient Management of Voxelised Full-Waveform Airborne LiDAR Data during 3D Polygonal Model Creation

Remote Sensing ◽

10.3390/rs13040559 ◽

2021 ◽

Vol 13 (4) ◽

pp. 559

Author(s):

Milto Miltiadou ◽

Neill D. F. Campbell ◽

Darren Cosker ◽

Michael G. Grant

Keyword(s):

Data Structure ◽

Data Structures ◽

Airborne Lidar ◽

Memory Allocation ◽

Lidar Data ◽

Full Waveform ◽

Waveform Lidar ◽

Airborne Lidar Data ◽

Full Waveform Lidar ◽

Polygonal Model

In this paper, we investigate the performance of six data structures for managing voxelised full-waveform airborne LiDAR data during 3D polygonal model creation. While full-waveform LiDAR data has been available for over a decade, extraction of peak points is the most widely used approach of interpreting them. The increased information stored within the waveform data makes interpretation and handling difficult. It is, therefore, important to research which data structures are more appropriate for storing and interpreting the data. In this paper, we investigate the performance of six data structures while voxelising and interpreting full-waveform LiDAR data for 3D polygonal model creation. The data structures are tested in terms of time efficiency and memory consumption during run-time and are the following: (1) 1D-Array that guarantees coherent memory allocation, (2) Voxel Hashing, which uses a hash table for storing the intensity values (3) Octree (4) Integral Volumes that allows finding the sum of any cuboid area in constant time, (5) Octree Max/Min, which is an upgraded octree and (6) Integral Octree, which is proposed here and it is an attempt to combine the benefits of octrees and Integral Volumes. In this paper, it is shown that Integral Volumes is the more time efficient data structure but it requires the most memory allocation. Furthermore, 1D-Array and Integral Volumes require the allocation of coherent space in memory including the empty voxels, while Voxel Hashing and the octree related data structures do not require to allocate memory for empty voxels. These data structures, therefore, and as shown in the test conducted, allocate less memory. To sum up, there is a need to investigate how the LiDAR data are stored in memory. Each tested data structure has different benefits and downsides; therefore, each application should be examined individually.

Download Full-text

Shape Neutral Analysis of Graph-based Data-structures

Theory and Practice of Logic Programming ◽

10.1017/s147106841800025x ◽

2018 ◽

Vol 18 (3-4) ◽

pp. 470-483 ◽

Cited By ~ 1

Author(s):

GREGORY J. DUCK ◽

JOXAN JAFFAR ◽

ROLAND H. C. YAP

Keyword(s):

Data Structure ◽

Data Structures ◽

Program Analysis ◽

Constraint Handling ◽

C Programs ◽

Wide Range ◽

Target Program ◽

Target Data ◽

Structure Graph ◽

Structure Properties

AbstractMalformed data-structures can lead to runtime errors such as arbitrary memory access or corruption. Despite this, reasoning over data-structure properties for low-level heap manipulating programs remains challenging. In this paper we present a constraint-based program analysis that checks data-structure integrity, w.r.t. given target data-structure properties, as the heap is manipulated by the program. Our approach is to automatically generate a solver for properties using the type definitions from the target program. The generated solver is implemented using a Constraint Handling Rules (CHR) extension of built-in heap, integer and equality solvers. A key property of our program analysis is that the target data-structure properties are shape neutral, i.e., the analysis does not check for properties relating to a given data-structure graph shape, such as doubly-linked-lists versus trees. Nevertheless, the analysis can detect errors in a wide range of data-structure manipulating programs, including those that use lists, trees, DAGs, graphs, etc. We present an implementation that uses the Satisfiability Modulo Constraint Handling Rules (SMCHR) system. Experimental results show that our approach works well for real-world C programs.

Download Full-text

Integrating Image Computation in Undergraduate Level Data-Structure Education

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001498000609 ◽

1998 ◽

Vol 12 (08) ◽

pp. 1071-1080 ◽

Cited By ~ 9

Author(s):

Sudeep Sarkar ◽

Dmitry Goldgof

Keyword(s):

Image Processing ◽

Image Analysis ◽

Data Structure ◽

Data Structures ◽

Science Curriculum ◽

Early Stage ◽

Real Life ◽

South Florida ◽

The University ◽

Structure Education

There is a growing need for expertise both in image analysis and in software engineering. To date, these two areas have been taught separately in an undergraduate computer and information science curriculum. However, we have found that introduction to image analysis can be easily integrated in data-structure courses without detracting from the original goal of teaching data structures. Some of the image processing tasks offer a natural way to introduce basic data structures such as arrays, queues, stacks, trees and hash tables. Not only does this integrated strategy expose the students to image related manipulations at an early stage of the curriculum but it also imparts cohesiveness to the data-structure assignments and brings them closer to real life. In this paper we present a set of programming assignments that integrates undergraduate data-structure education with image processing tasks. These assignments can be incorporated in existing data-structure courses with low time and software overheads. We have used these assignment sets thrice: once in a 10-week duration data-structure course at the University of California, Santa Barbara and the other two times in 15-week duration courses at the University of South Florida, Tampa.

Download Full-text

Designing an Engineering Database for Telephone Networks

ASME 1991 5th Annual Database Symposium: Engineering Databases — An Enterprise Resource ◽

10.1115/edm1991-0179 ◽

1991 ◽

Author(s):

Scott G. Danielson

Keyword(s):

Data Structure ◽

Data Model ◽

Database Modeling ◽

Telephone Networks ◽

Semantic Data ◽

Engineering Data ◽

Engineering Database ◽

Engineering Work ◽

Entity Relationship ◽

Semantic Data Model

Abstract An engineering database modeling telephone outside plant networks is developed. Semantic and relational database design methodologies are used with the semantic data model developed based on an extended entity-relationship approach. This logical model is used to generate a normalized relational data structure. This database holds engineering data supporting engineering analyses, engineering work order generation procedures, and network planning activities. The database has been linked to separate network analysis programs and CAD-based network maps by a database application.

Download Full-text

DenseZDD: A Compact and Fast Index for Families of Sets

Algorithms ◽

10.3390/a11080128 ◽

2018 ◽

Vol 11 (8) ◽

pp. 128 ◽

Cited By ~ 1

Author(s):

Shuhei Denzumi ◽

Jun Kawahara ◽

Koji Tsuda ◽

Hiroki Arimura ◽

Shin-ichi Minato ◽

...

Keyword(s):

Data Structure ◽

Data Structures ◽

Information Integration ◽

Decision Diagrams ◽

Web Information Retrieval ◽

Binary Decision ◽

Web Information ◽

Set Operations ◽

Succinct Data Structure ◽

The Family

In this article, we propose a succinct data structure of zero-suppressed binary decision diagrams (ZDDs). A ZDD represents sets of combinations efficiently and we can perform various set operations on the ZDD without explicitly extracting combinations. Thanks to these features, ZDDs have been applied to web information retrieval, information integration, and data mining. However, to support rich manipulation of sets of combinations and update ZDDs in the future, ZDDs need too much space, which means that there is still room to be compressed. The paper introduces a new succinct data structure, called DenseZDD, for further compressing a ZDD when we do not need to conduct set operations on the ZDD but want to examine whether a given set is included in the family represented by the ZDD, and count the number of elements in the family. We also propose a hybrid method, which combines DenseZDDs with ordinary ZDDs. By numerical experiments, we show that the sizes of our data structures are three times smaller than those of ordinary ZDDs, and membership operations and random sampling on DenseZDDs are about ten times and three times faster than those on ordinary ZDDs for some datasets, respectively.

Download Full-text

Spaced Seed Data Structures forDe NovoAssembly

International Journal of Genomics ◽

10.1155/2015/196591 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Inanç Birol ◽

Justin Chu ◽

Hamid Mohamadi ◽

Shaun D. Jackman ◽

Karthika Raghavan ◽

...

Keyword(s):

Data Structure ◽

Data Structures ◽

De Novo ◽

Bloom Filters ◽

De Bruijn Graph ◽

Sequence Specificity ◽

Sequencing Errors ◽

Spaced Seeds ◽

Read Error Correction ◽

Seed Data

De novoassembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG) paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.

Download Full-text