Smaller Compressed Suffix Arrays†

The Computer Journal ◽

10.1093/comjnl/bxaa016 ◽

2020 ◽

Author(s):

Ekaterina Benza ◽

Shmuel T Klein ◽

Dana Shapira

Keyword(s):

State Of The Art ◽

Suffix Array ◽

Space Complexity ◽

Suffix Arrays ◽

Processing Times ◽

Empirical Tests ◽

Space Requirements

Abstract An alternative to compressed suffix arrays is introduced, based on representing a sequence of integers using Fibonacci encodings, thereby reducing the space requirements of state-of-the-art implementations of the suffix array, while retaining the searching functionalities. Empirical tests support the theoretical space complexity improvements and show that there is no deterioration in the processing times.

Download Full-text

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

Beyond equi-joins

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476306 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2599-2612

Author(s):

Nikolaos Tziavelis ◽

Wolfgang Gatterbauer ◽

Mirek Riedewald

Keyword(s):

Experimental Study ◽

State Of The Art ◽

Database Systems ◽

Ranking Function ◽

Space Complexity ◽

Time And Space ◽

Running Time ◽

Join Queries ◽

Time And Space Complexity ◽

Memory Efficient

We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with n denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of k , the k top-ranked answers are returned in O ( n polylog n + k log k ) time. This is within a polylogarithmic factor of O ( n + k log k ), i.e., the best known complexity for equi-joins, and even of O ( n + k ), i.e., the time it takes to look at the input and return k answers in any order. Our guarantees extend to join queries with selections and many types of projections (namely those called "free-connex" queries and those that use bag semantics). Remarkably, they hold even when the number of join results is n ℓ for a join of ℓ relations. The key ingredient is a novel O ( n polylog n )-size factorized representation of the query output , which is constructed on-the-fly for a given query and database. In addition to providing the first nontrivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude.

Download Full-text

Efficient repeat finding in sets of strings via suffix arrays

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.597 ◽

2013 ◽

Vol Vol. 15 no. 2 (Discrete Algorithms) ◽

Author(s):

Pablo Barenbaum ◽

Verónica Becher ◽

Alejandro Deymonnaz ◽

Melisa Halsband ◽

Pablo Ariel Heiber

Keyword(s):

Suffix Array ◽

Input String ◽

Experimental Results ◽

Suffix Arrays ◽

Input Size ◽

Discrete Algorithms ◽

International Audience

Discrete Algorithms International audience We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.

Download Full-text

Measuring the Gap: Algorithmic Approximation Bounds for the Space Complexity of Stream Specifications

10.29007/t3jg ◽

2018 ◽

Author(s):

David Cerna ◽

Wolfgang Schreiner

Keyword(s):

Real World ◽

Predicate Logic ◽

Space Complexity ◽

Runtime Monitoring ◽

Large Fragment ◽

Memory Efficiency ◽

Real World Applications ◽

Space Requirements ◽

Algorithmic Procedure

In previous work we presented an algorithmic procedure for analysing the space complexity of monitor specifications written in a fragment of predicate logic. These monitor specifications were developed for runtime monitoring of event streams. Our procedure provides accurate results for a large fragment of the possible specifications, but overestimates the space complexity of precisely those specifications which are more likely to be found in real world applications. Experiments hinted at a relationship between the extent our procedure over-approximates the space requirements of a specification and the quantifier structure of the specification. In this paper we provide a formalization of this relationship as approximation ratios, and are able to pinpoint ``good'' constructions, that is specifications using less memory. These results are first steps towards categorizing specifications based on memory efficiency.

Download Full-text

Dynamic Generalized Suffix Arrays

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1398 ◽

2012 ◽

Vol 263-266 ◽

pp. 1398-1401

Author(s):

Song Feng Lu ◽

Hua Zhao

Keyword(s):

Data Structure ◽

Pattern Matching ◽

Time Complexity ◽

Document Retrieval ◽

Suffix Array ◽

Index Structure ◽

Suffix Arrays ◽

Basic Task ◽

Insertion And Deletion ◽

Dynamic Version

Document retrieval is the basic task of search engines, and seize amount of attention by the pattern matching community. In this paper, we focused on the dynamic version of this problem, in which the text insertion and deletion is allowable. By using the generalized suffix array and other data structure, we proposed a new index structure. Our scheme achieved better time complexity than the existing ones, and a bit more space overhead is needed as return.

Download Full-text

SUNNY: a Lazy Portfolio Approach for Constraint Solving

Theory and Practice of Logic Programming ◽

10.1017/s1471068414000179 ◽

2014 ◽

Vol 14 (4-5) ◽

pp. 509-524 ◽

Cited By ~ 11

Author(s):

ROBERTO AMADINI ◽

MAURIZIO GABBRIELLI ◽

JACOPO MAURO

Keyword(s):

Logic Programming ◽

Constraint Satisfaction ◽

State Of The Art ◽

Constraint Satisfaction Problem ◽

Answer Set Programming ◽

Constraint Solving ◽

Explicit Model ◽

Actual Performance ◽

Portfolio Approach ◽

Empirical Tests

AbstractWithin the context of constraint solving, a portfolio approach allows one to exploit the synergy between different solvers in order to create a globally better solver. In this paper we present SUNNY: a simple and flexible algorithm that takes advantage of a portfolio of constraint solvers in order to compute — without learning an explicit model — a schedule of them for solving a given Constraint Satisfaction Problem (CSP). Motivated by the performance reached by SUNNY vs. different simulations of other state of the art approaches, we developedsunny-csp, an effective portfolio solver that exploits the underlying SUNNY algorithm in order to solve a given CSP. Empirical tests conducted on exhaustive benchmarks of MiniZinc models show that the actual performance ofsunny-cspconforms to the predictions. This is encouraging both for improving the power of CSP portfolio solvers and for trying to export them to fields such as Answer Set Programming and Constraint Logic Programming.

Download Full-text

Designing efficient algorithms for querying large corpora

Oslo Studies in Language ◽

10.5617/osla.8504 ◽

2021 ◽

Vol 11 (2) ◽

pp. 283-302

Author(s):

Paul Meurer

Keyword(s):

Regular Expression ◽

Linear Time ◽

Suffix Array ◽

Efficient Algorithms ◽

Regular Expressions ◽

Efficient Treatment ◽

Suffix Arrays ◽

Regular Expression Matching ◽

Finite State ◽

Query System

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.

Download Full-text

Object gripping algorithm for robotic assistance by means of deep learning

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i6.pp6292-6299 ◽

2020 ◽

Vol 10 (6) ◽

pp. 6292

Author(s):

Robinson Jimenez-Moreno ◽

Astrid Rubiano Fonseca ◽

Jose Luis Ramirez

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Robotic Assistance ◽

Average Precision ◽

Convolutional Networks ◽

Processing Times ◽

Learning Techniques ◽

Robotic Applications ◽

Test Database ◽

Multiple Stages

This paper exposes the use of recent deep learning techniques in the state of the art, little addressed in robotic applications, where a new algorithm based on Faster R-CNN and CNN regression is exposed. The machine vision systems implemented, tend to require multiple stages to locate an object and allow a robot to take it, increasing the noise in the system and the processing times. The convolutional networks based on regions allow one to solve this problem, it is used for it two convolutional architectures, one for classification and location of three types of objects and one to determine the grip angle for a robotic gripper. Under the establish virtual environment, the grip algorithm works up to 5 frames per second with a 100% object classification, and with the implementation of the Faster R-CNN, it allows obtain 100% accuracy in the classifications of the test database, and over a 97% of average precision locating the generated boxes in each element, gripping successfully the objects.

Download Full-text

Symbolic and Numeric Kernel Division for GPU-based FEA Assembly of Regular Meshes with Modified Sparse Storage Formats

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4051123 ◽

2021 ◽

pp. 1-35

Author(s):

Subhajit Sanfui ◽

Deepak Sharma

Keyword(s):

Graphics Processing Units ◽

Degrees Of Freedom ◽

State Of The Art ◽

General Purpose ◽

Storage Space ◽

Element Analysis ◽

Race Condition ◽

Inherent Problem ◽

Space Requirements ◽

Graphics Processing

Abstract This paper presents an efficient strategy to perform the assembly stage of finite element analysis (FEA) on general-purpose graphics processing units (GPU). This strategy involves dividing the assembly task by using symbolic and numeric kernels, and thereby reducing the complexity of the standard single-kernel assembly approach. Two sparse storage formats based on the proposed strategy are also developed by modifying the existing sparse storage formats with the intention of removing the degrees of freedom-based redundancies in the global matrix. The inherent problem of race condition is resolved through the implementation of coloring and atomics. The proposed strategy is compared with the state-of-the-art GPU-based and CPU-based assembly techniques. These comparisons reveal a significant number of benefits in terms of reducing storage space requirements and execution time and increasing performance (GFLOPS). Moreover, using the proposed strategy, it is found that the coloring method is more effective compared to the atomics-based method for the existing as well as the modified storage formats.

Download Full-text

THE VIRTUAL SUFFIX TREE

International Journal of Foundations of Computer Science ◽

10.1142/s0129054109007066 ◽

2009 ◽

Vol 20 (06) ◽

pp. 1109-1133 ◽

Cited By ~ 2

Author(s):

JIE LIN ◽

YUE JIANG ◽

DON ADJEROH

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Suffix Array ◽

Intermediate Step ◽

Suffix Trees ◽

String Length ◽

Space Requirement ◽

Suffix Arrays ◽

Tree Construction ◽

Efficient Data

We introduce the VST (virtual suffix tree), an efficient data structure for suffix trees and suffix arrays. Starting from the suffix array, we construct the suffix tree, from which we derive the virtual suffix tree. Later, we remove the intermediate step of suffix tree construction, and build the VST directly from the suffix array. The VST provides the same functionality as the suffix tree, including suffix links, but at a much smaller space requirement. It has the same linear time construction even for large alphabets, Σ, requires O(n) space to store (n is the string length), and allows searching for a pattern of length m to be performed in O(m log |Σ|) time, the same time needed for a suffix tree. Given the VST, we show an algorithm that computes all the suffix links in linear time, independent of Σ. The VST requires less space than other recently proposed data structures for suffix trees and suffix arrays, such as the enhanced suffix array [1], and the linearized suffix tree [17]. On average, the space requirement (including that for suffix arrays and suffix links) is 13.8n bytes for the regular VST, and 12.05n bytes in its compact form.

Download Full-text