Shape Neutral Analysis of Graph-based Data-structures

AbstractMalformed data-structures can lead to runtime errors such as arbitrary memory access or corruption. Despite this, reasoning over data-structure properties for low-level heap manipulating programs remains challenging. In this paper we present a constraint-based program analysis that checks data-structure integrity, w.r.t. given target data-structure properties, as the heap is manipulated by the program. Our approach is to automatically generate a solver for properties using the type definitions from the target program. The generated solver is implemented using a Constraint Handling Rules (CHR) extension of built-in heap, integer and equality solvers. A key property of our program analysis is that the target data-structure properties are shape neutral, i.e., the analysis does not check for properties relating to a given data-structure graph shape, such as doubly-linked-lists versus trees. Nevertheless, the analysis can detect errors in a wide range of data-structure manipulating programs, including those that use lists, trees, DAGs, graphs, etc. We present an implementation that uses the Satisfiability Modulo Constraint Handling Rules (SMCHR) system. Experimental results show that our approach works well for real-world C programs.

Download Full-text

R3D3: A Doubly Opportunistic Data Structure for Compressing and Indexing Massive Data

Infocommunications journal ◽

10.36244/icj.2019.2.7 ◽

2019 ◽

pp. 58-66

Author(s):

Máté Nagy ◽

János Tapolcai ◽

Gábor Rétvári

Keyword(s):

Data Structure ◽

Data Structures ◽

Real Data ◽

Small Error ◽

Data Sets ◽

Space Reduction ◽

Wide Range ◽

Arbitrary Position ◽

Efficient Data ◽

Space Requirements

Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in iformationtheoretically minimum space. Yet, efficient data processing requires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.

Download Full-text

From Ada to Platinum SPARK

ACM SIGAda Ada Letters ◽

10.1145/3463478.3463488 ◽

2021 ◽

Vol 40 (2) ◽

pp. 76-91

Author(s):

Patrick Rogers

Keyword(s):

Data Structure ◽

Programming Language ◽

Data Structures ◽

Computer Programming ◽

Multiple Forms ◽

Approach To Learning ◽

Wide Range

An effective approach to learning a new programming language is to implement data structures common to computer programming. The approach is effective because the problem to be solved is well understood, allowing one to focus on the language details. Moreover, several different forms of a given data structure are often possible: bounded versus unbounded, sequential versus thread-safe, and so on. These multiple forms likely require a wide range of language features.

Download Full-text

Cache-efficient sweeping-based interval joins for extended Allen relation predicates

The VLDB Journal ◽

10.1007/s00778-020-00650-5 ◽

2021 ◽

Author(s):

Danila Piatov ◽

Sven Helmer ◽

Anton Dignös ◽

Fabio Persia

Keyword(s):

Data Structure ◽

Experimental Evaluation ◽

State Of The Art ◽

Temporal Databases ◽

Access Method ◽

Wide Range ◽

Interval Relation ◽

Cache Efficient ◽

Join Algorithms ◽

Better Than

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.

Download Full-text

A Comparative Study about Data Structures Used for Efficient Management of Voxelised Full-Waveform Airborne LiDAR Data during 3D Polygonal Model Creation

Remote Sensing ◽

10.3390/rs13040559 ◽

2021 ◽

Vol 13 (4) ◽

pp. 559

Author(s):

Milto Miltiadou ◽

Neill D. F. Campbell ◽

Darren Cosker ◽

Michael G. Grant

Keyword(s):

Data Structure ◽

Data Structures ◽

Airborne Lidar ◽

Memory Allocation ◽

Lidar Data ◽

Full Waveform ◽

Waveform Lidar ◽

Airborne Lidar Data ◽

Full Waveform Lidar ◽

Polygonal Model

In this paper, we investigate the performance of six data structures for managing voxelised full-waveform airborne LiDAR data during 3D polygonal model creation. While full-waveform LiDAR data has been available for over a decade, extraction of peak points is the most widely used approach of interpreting them. The increased information stored within the waveform data makes interpretation and handling difficult. It is, therefore, important to research which data structures are more appropriate for storing and interpreting the data. In this paper, we investigate the performance of six data structures while voxelising and interpreting full-waveform LiDAR data for 3D polygonal model creation. The data structures are tested in terms of time efficiency and memory consumption during run-time and are the following: (1) 1D-Array that guarantees coherent memory allocation, (2) Voxel Hashing, which uses a hash table for storing the intensity values (3) Octree (4) Integral Volumes that allows finding the sum of any cuboid area in constant time, (5) Octree Max/Min, which is an upgraded octree and (6) Integral Octree, which is proposed here and it is an attempt to combine the benefits of octrees and Integral Volumes. In this paper, it is shown that Integral Volumes is the more time efficient data structure but it requires the most memory allocation. Furthermore, 1D-Array and Integral Volumes require the allocation of coherent space in memory including the empty voxels, while Voxel Hashing and the octree related data structures do not require to allocate memory for empty voxels. These data structures, therefore, and as shown in the test conducted, allocate less memory. To sum up, there is a need to investigate how the LiDAR data are stored in memory. Each tested data structure has different benefits and downsides; therefore, each application should be examined individually.

Download Full-text

Integrating Image Computation in Undergraduate Level Data-Structure Education

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001498000609 ◽

1998 ◽

Vol 12 (08) ◽

pp. 1071-1080 ◽

Cited By ~ 9

Author(s):

Sudeep Sarkar ◽

Dmitry Goldgof

Keyword(s):

Image Processing ◽

Image Analysis ◽

Data Structure ◽

Data Structures ◽

Science Curriculum ◽

Early Stage ◽

Real Life ◽

South Florida ◽

The University ◽

Structure Education

There is a growing need for expertise both in image analysis and in software engineering. To date, these two areas have been taught separately in an undergraduate computer and information science curriculum. However, we have found that introduction to image analysis can be easily integrated in data-structure courses without detracting from the original goal of teaching data structures. Some of the image processing tasks offer a natural way to introduce basic data structures such as arrays, queues, stacks, trees and hash tables. Not only does this integrated strategy expose the students to image related manipulations at an early stage of the curriculum but it also imparts cohesiveness to the data-structure assignments and brings them closer to real life. In this paper we present a set of programming assignments that integrates undergraduate data-structure education with image processing tasks. These assignments can be incorporated in existing data-structure courses with low time and software overheads. We have used these assignment sets thrice: once in a 10-week duration data-structure course at the University of California, Santa Barbara and the other two times in 15-week duration courses at the University of South Florida, Tampa.

Download Full-text

DenseZDD: A Compact and Fast Index for Families of Sets

Algorithms ◽

10.3390/a11080128 ◽

2018 ◽

Vol 11 (8) ◽

pp. 128 ◽

Cited By ~ 1

Author(s):

Shuhei Denzumi ◽

Jun Kawahara ◽

Koji Tsuda ◽

Hiroki Arimura ◽

Shin-ichi Minato ◽

...

Keyword(s):

Data Structure ◽

Data Structures ◽

Information Integration ◽

Decision Diagrams ◽

Web Information Retrieval ◽

Binary Decision ◽

Web Information ◽

Set Operations ◽

Succinct Data Structure ◽

The Family

In this article, we propose a succinct data structure of zero-suppressed binary decision diagrams (ZDDs). A ZDD represents sets of combinations efficiently and we can perform various set operations on the ZDD without explicitly extracting combinations. Thanks to these features, ZDDs have been applied to web information retrieval, information integration, and data mining. However, to support rich manipulation of sets of combinations and update ZDDs in the future, ZDDs need too much space, which means that there is still room to be compressed. The paper introduces a new succinct data structure, called DenseZDD, for further compressing a ZDD when we do not need to conduct set operations on the ZDD but want to examine whether a given set is included in the family represented by the ZDD, and count the number of elements in the family. We also propose a hybrid method, which combines DenseZDDs with ordinary ZDDs. By numerical experiments, we show that the sizes of our data structures are three times smaller than those of ordinary ZDDs, and membership operations and random sampling on DenseZDDs are about ten times and three times faster than those on ordinary ZDDs for some datasets, respectively.

Download Full-text

Spaced Seed Data Structures forDe NovoAssembly

International Journal of Genomics ◽

10.1155/2015/196591 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Inanç Birol ◽

Justin Chu ◽

Hamid Mohamadi ◽

Shaun D. Jackman ◽

Karthika Raghavan ◽

...

Keyword(s):

Data Structure ◽

Data Structures ◽

De Novo ◽

Bloom Filters ◽

De Bruijn Graph ◽

Sequence Specificity ◽

Sequencing Errors ◽

Spaced Seeds ◽

Read Error Correction ◽

Seed Data

De novoassembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG) paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.

Download Full-text

Program Verification using Constraint Handling Rules and Array Constraint Generalizations

10.29007/dkxs ◽

2018 ◽

Author(s):

Emanuele De Angelis ◽

Fabio Fioravanti ◽

Alberto Pettorossi ◽

Maurizio Proietti

Keyword(s):

Data Structures ◽

Program Verification ◽

Linear Constraints ◽

Constraint Handling ◽

Logic Programs ◽

Replacement Strategy ◽

Partial Correctness ◽

Transformation Rules ◽

Constraint Logic Programs ◽

Additional Constraints

The transformation of constraint logic programs (CLP programs)has been shown to be an effective methodologyfor verifying properties of imperative programs.By following this methodology, we encode the negationof a partial correctness property of an imperativeprogram prog as a predicate incorrect defined by a CLP program P, and we show thatprog is correct by transforming P intothe empty program through the applicationof semantics preserving transformation rules.Some of these rules perform replacements of constraintsthat encode properties of the data structures manipulatedby the program prog.In this paper we show that Constraint Handling Rules (CHR)are a suitable formalism for representing and applyingconstraint replacements during the transformation of CLP programs.In particular, we consider programs that manipulate integerarrays and we present a CHR encoding of a constraint replacementstrategy based on the theory of arrays.We also propose a novel generalization strategy forconstraints on integer arrays that combinesthe CHR constraint replacement strategywith various generalization operator for linear constraints,such as widening and convex hull.Generalization is controlled by additional constraintsthat relate the variable identifiers in the imperativeprogram and the CLP representation of their values.The method presented in this paper has been implemented andwe have demonstrated itseffectiveness on a set ofbenchmark programs taken from the literature.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

DSI: an evidence-based approach to identify dynamic data structures in C programs

Proceedings of the 25th International Symposium on Software Testing and Analysis - ISSTA 2016 ◽

10.1145/2931037.2931071 ◽

2016 ◽

Cited By ~ 5

Author(s):

David H. White ◽

Thomas Rupprecht ◽

Gerald Lüttgen

Keyword(s):

Data Structures ◽

Evidence Based ◽

Dynamic Data Structures ◽

Dynamic Data ◽

C Programs

Download Full-text