Query-Optimal Partially Persistent B-Trees with Constant Worst-Case Update Time

2017 ◽  
Vol 28 (02) ◽  
pp. 141-169
Author(s):  
George Lagogiannis

In this paper we present the first data structure for partially persistent B-trees with constant worst-case update time, where deletions are handled in a symmetrical to insertions manner. Our structure matches the query times of optimal partially persistent B-trees that do not support constant update time thus, we have managed to reduce the worst-case update time to a constant, without a penalty in the query times. The new data structure is produced by mixing two other data structures, (a) the partially persistent B-tree and (b) the balanced search tree with constant worst-case update time.

Author(s):  
F. Çetin ◽  
M. O. Kulekci

Abstract. This paper presents a study that compares the three space partitioning and spatial indexing techniques, KD Tree, Quad KD Tree, and PR Tree. KD Tree is a data structure proposed by Bentley (Bentley and Friedman, 1979) that aims to cluster objects according to their spatial location. Quad KD Tree is a data structure proposed by Berezcky (Bereczky et al., 2014) that aims to partition objects using heuristic methods. Unlike Bereczky’s partitioning technique, a new partitioning technique is presented based on dividing objects according to space-driven, in the context of this study. PR Tree is a data structure proposed by Arge (Arge et al., 2008) that is an asymptotically optimal R-Tree variant, enables data-driven segmentation. This study mainly aimed to search and render big spatial data in real-time safety-critical avionics navigation map application. Such a real-time system needs to efficiently reach the required records inside a specific boundary. Performing range query during the runtime (such as finding the closest neighbors) is extremely important in performance. The most crucial purpose of these data structures is to reduce the number of comparisons to solve the range searching problem. With this study, the algorithms’ data structures are created and indexed, and worst-case analyses are made to cover the whole area to measure the range search performance. Also, these techniques’ performance is benchmarked according to elapsed time and memory usage. As a result of these experimental studies, Quad KD Tree outperformed in range search analysis over the other techniques, especially when the data set is massive and consists of different geometry types.


1996 ◽  
Vol 07 (02) ◽  
pp. 137-149 ◽  
Author(s):  
RUDOLF FLEISCHER

In this paper we show how a slight modification of (a, 2b)-trees allows us to perform member and neighbor queries in O( log n) time and updates in O(1) worst-case time (once the position of the inserted or deleted key is known). Our data structure is quite natural and much simpler than previous worst-case optimal solutions. It is based on two techniques : 1) bucketing, i.e., storing an ordered list of 2 log n keys in each leaf of an (a, 2b) tree, and 2) preventive splitting, i.e., splitting nodes before they can grow bigger than allowed. If only insertions are allowed, it can also be used as a finger search tree with O( log * n) worst-case update time.


2001 ◽  
Vol 8 (36) ◽  
Author(s):  
Gerth Stølting Brodal ◽  
Rolf Fagerberg ◽  
Riko Jacob

We propose a version of cache oblivious search trees which is simpler than the previous proposal of Bender, Demaine and Farach-Colton and has the same complexity bounds. In particular, our data structure avoids the use of weight balanced B-trees, and can be implemented as just a single array of data elements, without the use of pointers. The structure also improves space utilization.<br /> <br />For storing n elements, our proposal uses (1+epsilon)n times the element size of memory, and performs searches in worst case O(log_B n) memory transfers, updates in amortized O((log^2 n)/(epsilon B)) memory transfers, and range queries in worst case O(log_B n + k/B) memory transfers, where k is the size of the output.<br /> <br />The basic idea of our data structure is to maintain a dynamic binary tree of height log n + O(1) using existing methods, embed this tree in a static binary tree, which in turn is embedded in an array in a cache oblivious fashion, using the van Emde Boas layout of Prokop.<br /> <br />We also investigate the practicality of cache obliviousness in the area of search trees, by providing an empirical comparison of different methods for laying out a search tree in memory.<br /> <br />The source code of the programs, our scripts and tools, and the data we present here are available online under ftp.brics.dk/RS/01/36/Experiments/.


Author(s):  
Chumphol Bunkhumpornpat ◽  
Varin Chouvatut ◽  
Saowaluk Rattanaudomsawat

A search tree is a data structure constructed from a minimum spanning tree. This data structure is used for determining the cluster membership of a query instance clustered by a similarity-guaranteed clustering algorithm. According to the line topology of a search tree in the worst case, the time complexity of tree traversing is O(n) where n is the number of nodes of the tree. Unfortunately, the AVL tree algorithm cannot solve this problem because the algorithm is unable to maintain the hierarchical structure of a search tree. From the definition of balance factor, our proposed algorithm is designed to rotate nodes until the tree becomes balanced while maintaining the hierarchical structure. Consequently, the balanced search tree achieves the optimal time complexity of O(log n) for the searching purpose.


2021 ◽  
Vol 13 (4) ◽  
pp. 559
Author(s):  
Milto Miltiadou ◽  
Neill D. F. Campbell ◽  
Darren Cosker ◽  
Michael G. Grant

In this paper, we investigate the performance of six data structures for managing voxelised full-waveform airborne LiDAR data during 3D polygonal model creation. While full-waveform LiDAR data has been available for over a decade, extraction of peak points is the most widely used approach of interpreting them. The increased information stored within the waveform data makes interpretation and handling difficult. It is, therefore, important to research which data structures are more appropriate for storing and interpreting the data. In this paper, we investigate the performance of six data structures while voxelising and interpreting full-waveform LiDAR data for 3D polygonal model creation. The data structures are tested in terms of time efficiency and memory consumption during run-time and are the following: (1) 1D-Array that guarantees coherent memory allocation, (2) Voxel Hashing, which uses a hash table for storing the intensity values (3) Octree (4) Integral Volumes that allows finding the sum of any cuboid area in constant time, (5) Octree Max/Min, which is an upgraded octree and (6) Integral Octree, which is proposed here and it is an attempt to combine the benefits of octrees and Integral Volumes. In this paper, it is shown that Integral Volumes is the more time efficient data structure but it requires the most memory allocation. Furthermore, 1D-Array and Integral Volumes require the allocation of coherent space in memory including the empty voxels, while Voxel Hashing and the octree related data structures do not require to allocate memory for empty voxels. These data structures, therefore, and as shown in the test conducted, allocate less memory. To sum up, there is a need to investigate how the LiDAR data are stored in memory. Each tested data structure has different benefits and downsides; therefore, each application should be examined individually.


2018 ◽  
Vol 18 (3-4) ◽  
pp. 470-483 ◽  
Author(s):  
GREGORY J. DUCK ◽  
JOXAN JAFFAR ◽  
ROLAND H. C. YAP

AbstractMalformed data-structures can lead to runtime errors such as arbitrary memory access or corruption. Despite this, reasoning over data-structure properties for low-level heap manipulating programs remains challenging. In this paper we present a constraint-based program analysis that checks data-structure integrity, w.r.t. given target data-structure properties, as the heap is manipulated by the program. Our approach is to automatically generate a solver for properties using the type definitions from the target program. The generated solver is implemented using a Constraint Handling Rules (CHR) extension of built-in heap, integer and equality solvers. A key property of our program analysis is that the target data-structure properties are shape neutral, i.e., the analysis does not check for properties relating to a given data-structure graph shape, such as doubly-linked-lists versus trees. Nevertheless, the analysis can detect errors in a wide range of data-structure manipulating programs, including those that use lists, trees, DAGs, graphs, etc. We present an implementation that uses the Satisfiability Modulo Constraint Handling Rules (SMCHR) system. Experimental results show that our approach works well for real-world C programs.


Author(s):  
Sudeep Sarkar ◽  
Dmitry Goldgof

There is a growing need for expertise both in image analysis and in software engineering. To date, these two areas have been taught separately in an undergraduate computer and information science curriculum. However, we have found that introduction to image analysis can be easily integrated in data-structure courses without detracting from the original goal of teaching data structures. Some of the image processing tasks offer a natural way to introduce basic data structures such as arrays, queues, stacks, trees and hash tables. Not only does this integrated strategy expose the students to image related manipulations at an early stage of the curriculum but it also imparts cohesiveness to the data-structure assignments and brings them closer to real life. In this paper we present a set of programming assignments that integrates undergraduate data-structure education with image processing tasks. These assignments can be incorporated in existing data-structure courses with low time and software overheads. We have used these assignment sets thrice: once in a 10-week duration data-structure course at the University of California, Santa Barbara and the other two times in 15-week duration courses at the University of South Florida, Tampa.


2021 ◽  
Author(s):  
ZEGOUR Djamel Eddine

Abstract Today, Red-Black trees are becoming a popular data structure typically used to implement dictionaries, associative arrays, symbol tables within some compilers (C++, Java …) and many other systems. In this paper, we present an improvement of the delete algorithm of this kind of binary search tree. The proposed algorithm is very promising since it colors differently the tree while reducing color changes by a factor of about 29%. Moreover, the maintenance operations re-establishing Red-Black tree balance properties are reduced by a factor of about 11%. As a consequence, the proposed algorithm saves about 4% on running time when insert and delete operations are used together while conserving search performance of the standard algorithm.


Algorithms ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 128 ◽  
Author(s):  
Shuhei Denzumi ◽  
Jun Kawahara ◽  
Koji Tsuda ◽  
Hiroki Arimura ◽  
Shin-ichi Minato ◽  
...  

In this article, we propose a succinct data structure of zero-suppressed binary decision diagrams (ZDDs). A ZDD represents sets of combinations efficiently and we can perform various set operations on the ZDD without explicitly extracting combinations. Thanks to these features, ZDDs have been applied to web information retrieval, information integration, and data mining. However, to support rich manipulation of sets of combinations and update ZDDs in the future, ZDDs need too much space, which means that there is still room to be compressed. The paper introduces a new succinct data structure, called DenseZDD, for further compressing a ZDD when we do not need to conduct set operations on the ZDD but want to examine whether a given set is included in the family represented by the ZDD, and count the number of elements in the family. We also propose a hybrid method, which combines DenseZDDs with ordinary ZDDs. By numerical experiments, we show that the sizes of our data structures are three times smaller than those of ordinary ZDDs, and membership operations and random sampling on DenseZDDs are about ten times and three times faster than those on ordinary ZDDs for some datasets, respectively.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Inanç Birol ◽  
Justin Chu ◽  
Hamid Mohamadi ◽  
Shaun D. Jackman ◽  
Karthika Raghavan ◽  
...  

De novoassembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG) paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.


Sign in / Sign up

Export Citation Format

Share Document