Succinct Encoding of Binary Strings Representing Triangulations

Algorithmica ◽

10.1007/s00453-021-00861-4 ◽

2021 ◽

Author(s):

José Fuentes-Sepúlveda ◽

Diego Seco ◽

Raquel Viaña

Keyword(s):

Information Theory ◽

Data Structure ◽

Experimental Evaluation ◽

Special Class ◽

Spanning Trees ◽

State Of The Art ◽

Succinct Data Structure ◽

Planar Embeddings ◽

Specific Sequences ◽

Binary Strings

AbstractWe consider the problem of designing a succinct data structure for representing the connectivity of planar triangulations. The main result is a new succinct encoding achieving the information-theory optimal bound of 3.24 bits per vertex, while allowing efficient navigation. Our representation is based on the bijection of Poulalhon and Schaeffer (Algorithmica, 46(3):505–527, 2006) that defines a mapping between planar triangulations and a special class of spanning trees, called PS-trees. The proposed solution differs from previous approaches in that operations in planar triangulations are reduced to operations in particular parentheses sequences encoding PS-trees. Existing methods to handle balanced parentheses sequences have to be combined and extended to operate on such specific sequences, essentially for retrieving matching elements. The new encoding supports extracting the d neighbors of a query vertex in O(d) time and testing adjacency between two vertices in O(1) time. Additionally, we provide an implementation of our proposed data structure. In the experimental evaluation, our representation reaches up to 7.35 bits per vertex, improving the space usage of state-of-the-art implementations for planar embeddings.

Download Full-text

Cache-efficient sweeping-based interval joins for extended Allen relation predicates

The VLDB Journal ◽

10.1007/s00778-020-00650-5 ◽

2021 ◽

Author(s):

Danila Piatov ◽

Sven Helmer ◽

Anton Dignös ◽

Fabio Persia

Keyword(s):

Data Structure ◽

Experimental Evaluation ◽

State Of The Art ◽

Temporal Databases ◽

Access Method ◽

Wide Range ◽

Interval Relation ◽

Cache Efficient ◽

Join Algorithms ◽

Better Than

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.

Download Full-text

Prime Implicate Generation in Equational Logic

Journal of Artificial Intelligence Research ◽

10.1613/jair.5481 ◽

2017 ◽

Vol 60 ◽

pp. 827-880 ◽

Cited By ~ 1

Author(s):

Mnacho Echenim ◽

Nicolas Peltier ◽

Sophie Tourret

Keyword(s):

Data Structure ◽

Experimental Evaluation ◽

State Of The Art ◽

Equational Logic ◽

First Order ◽

Correctness Proofs ◽

Tree Data ◽

Tree Data Structure

We present an algorithm for the generation of prime implicates in equational logic, that is, of the most general consequences of formulæ containing equations and disequations between first-order terms. This algorithm is defined by a calculus that is proved to be correct and complete. We then focus on the case where the considered clause set is ground, i.e., contains no variables, and devise a specialized tree data structure that is designed to efficiently detect and delete redundant implicates. The corresponding algorithms are presented along with their termination and correctness proofs. Finally, an experimental evaluation of this prime implicate generation method is conducted in the ground case, including a comparison with state-of-the-art propositional and first-order prime implicate generation tools.

Download Full-text

Succinct range filters

Communications of the ACM ◽

10.1145/3450262 ◽

2021 ◽

Vol 64 (4) ◽

pp. 166-173

Author(s):

Huanchen Zhang ◽

Hyeontaek Lim ◽

Viktor Leis ◽

David G. Andersen ◽

Michael Kaminsky ◽

...

Keyword(s):

Information Theory ◽

Data Structure ◽

State Of The Art ◽

Bloom Filters ◽

Range Queries ◽

Database Storage

We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries, such as range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node---a space close to the minimum required by information theory. Our experiments show that SuRF speeds up range queries in a widely used database storage engine by up to 5×.

Download Full-text

Frequent Multi-Byte Character Subtring Extraction using a Succinct Data Structure

Proceedings of the 2016 ACM Symposium on Document Engineering - DocEng '16 ◽

10.1145/2960811.2967161 ◽

2016 ◽

Author(s):

Phanucheep Chotnithi ◽

Atsuhiro Takasu

Keyword(s):

Data Structure ◽

Succinct Data Structure

Download Full-text

All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3460122 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-22

Author(s):

Jerzy Proficz

Keyword(s):

Experimental Evaluation ◽

Data Exchange ◽

State Of The Art ◽

Monitoring And Evaluation ◽

The Other ◽

Early Data ◽

Cluster Architecture ◽

Novel Algorithms

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.

Download Full-text

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

Cholinium amino acid-based ionic liquids

Biophysical Reviews ◽

10.1007/s12551-021-00782-0 ◽

2021 ◽

Author(s):

Andrea Le Donne ◽

Enrico Bodo

Keyword(s):

Ionic Liquids ◽

Amino Acid ◽

Computational Chemistry ◽

Special Class ◽

State Of The Art ◽

Short Review ◽

Research Activity ◽

Intensive Research ◽

Low Toxicity ◽

Molecular Components

AbstractBoosted by the simplicity of their synthesis and low toxicity, cholinium and amino acid-based ionic liquids have attracted the attention of researchers in many different fields ranging from computational chemistry to electrochemistry and medicine. Among the uncountable IL variations, these substances occupy a space on their own due to their exceptional biocompatibility that stems from being entirely made by metabolic molecular components. These substances have undergone a rather intensive research activity because of the possibility of using them as greener replacements for traditional ionic liquids. We present here a short review in the attempt to provide a compendium of the state-of-the-art scientific research about this special class of ionic liquids based on the combination of amino acid anions and cholinium cations.

Download Full-text

DenseZDD: A Compact and Fast Index for Families of Sets

Algorithms ◽

10.3390/a11080128 ◽

2018 ◽

Vol 11 (8) ◽

pp. 128 ◽

Cited By ~ 1

Author(s):

Shuhei Denzumi ◽

Jun Kawahara ◽

Koji Tsuda ◽

Hiroki Arimura ◽

Shin-ichi Minato ◽

...

Keyword(s):

Data Structure ◽

Data Structures ◽

Information Integration ◽

Decision Diagrams ◽

Web Information Retrieval ◽

Binary Decision ◽

Web Information ◽

Set Operations ◽

Succinct Data Structure ◽

The Family

In this article, we propose a succinct data structure of zero-suppressed binary decision diagrams (ZDDs). A ZDD represents sets of combinations efficiently and we can perform various set operations on the ZDD without explicitly extracting combinations. Thanks to these features, ZDDs have been applied to web information retrieval, information integration, and data mining. However, to support rich manipulation of sets of combinations and update ZDDs in the future, ZDDs need too much space, which means that there is still room to be compressed. The paper introduces a new succinct data structure, called DenseZDD, for further compressing a ZDD when we do not need to conduct set operations on the ZDD but want to examine whether a given set is included in the family represented by the ZDD, and count the number of elements in the family. We also propose a hybrid method, which combines DenseZDDs with ordinary ZDDs. By numerical experiments, we show that the sizes of our data structures are three times smaller than those of ordinary ZDDs, and membership operations and random sampling on DenseZDDs are about ten times and three times faster than those on ordinary ZDDs for some datasets, respectively.

Download Full-text

ConnectIt

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436923 ◽

2020 ◽

Vol 14 (4) ◽

pp. 653-667

Author(s):

Laxman Dhulipala ◽

Changwan Hong ◽

Julian Shun

Keyword(s):

Experimental Evaluation ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Graph Connectivity ◽

Connected Components ◽

Sampling Strategies ◽

Spanning Forest ◽

Speed Up ◽

Minimum Spanning Forest ◽

Edge Sampling

Connected components is a fundamental kernel in graph applications. The fastest existing multicore algorithms for solving graph connectivity are based on some form of edge sampling and/or linking and compressing trees. However, many combinations of these design choices have been left unexplored. In this paper, we design the ConnectIt framework, which provides different sampling strategies as well as various tree linking and compression schemes. ConnectIt enables us to obtain several hundred new variants of connectivity algorithms, most of which extend to computing spanning forest. In addition to static graphs, we also extend ConnectIt to support mixes of insertions and connectivity queries in the concurrent setting. We present an experimental evaluation of ConnectIt on a 72-core machine, which we believe is the most comprehensive evaluation of parallel connectivity algorithms to date. Compared to a collection of state-of-the-art static multicore algorithms, we obtain an average speedup of 12.4x (2.36x average speedup over the fastest existing implementation for each graph). Using ConnectIt, we are able to compute connectivity on the largest publicly-available graph (with over 3.5 billion vertices and 128 billion edges) in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the fastest existing connectivity result for this graph, in any computational setting. For our incremental algorithms, we show that our algorithms can ingest graph updates at up to several billion edges per second. To guide the user in selecting the best variants in ConnectIt for different situations, we provide a detailed analysis of the different strategies. Finally, we show how the techniques in ConnectIt can be used to speed up two important graph applications: approximate minimum spanning forest and SCAN clustering.

Download Full-text

Hardware-oriented succinct-data-structure based on block-size-constrained compression

2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR) ◽

10.1109/socpar.2015.7492797 ◽

2015 ◽

Author(s):

Hasitha Muthumala Waidyasooriya ◽

Daisuke Ono ◽

Masanori Hariyama

Keyword(s):

Data Structure ◽

Block Size ◽

Succinct Data Structure

Download Full-text