Efficient repeat finding in sets of strings via suffix arrays

Discrete Algorithms International audience We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.

Download Full-text

Smaller Compressed Suffix Arrays†

The Computer Journal ◽

10.1093/comjnl/bxaa016 ◽

2020 ◽

Author(s):

Ekaterina Benza ◽

Shmuel T Klein ◽

Dana Shapira

Keyword(s):

State Of The Art ◽

Suffix Array ◽

Space Complexity ◽

Suffix Arrays ◽

Processing Times ◽

Empirical Tests ◽

Space Requirements

Abstract An alternative to compressed suffix arrays is introduced, based on representing a sequence of integers using Fibonacci encodings, thereby reducing the space requirements of state-of-the-art implementations of the suffix array, while retaining the searching functionalities. Empirical tests support the theoretical space complexity improvements and show that there is no deterioration in the processing times.

Download Full-text

On substitution tilings of the plane with n-fold rotational symmetry

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2108 ◽

2015 ◽

Vol Vol. 17 no. 1 (Discrete Algorithms) ◽

Author(s):

Gregory R. Maloney

Keyword(s):

Rotational Symmetry ◽

Substitution Tiling ◽

Computer Assistance ◽

Discrete Algorithms ◽

Special Cases ◽

International Audience ◽

Substitution Tilings

Discrete Algorithms International audience A method is described for constructing, with computer assistance, planar substitution tilings that have n-fold rotational symmetry. This method uses as prototiles the set of rhombs with angles that are integer multiples of pi/n, and includes various special cases that have already been constructed by hand for low values of n. An example constructed by this method for n = 11 is exhibited; this is the first substitution tiling with elevenfold symmetry appearing in the literature.

Download Full-text

Output sensitive algorithms for covering many points

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2102 ◽

2015 ◽

Vol Vol. 17 no. 1 (Discrete Algorithms) ◽

Author(s):

Hossein Ghasemalizadeh ◽

Mohammadreza Razzazi

Keyword(s):

Positive Integer ◽

Time Algorithm ◽

Covering Problems ◽

Discrete Algorithms ◽

International Audience ◽

Previous Algorithm ◽

Set Of Points

Discrete Algorithms International audience In this paper we devise some output sensitive algorithms for a problem where a set of points and a positive integer, m, are given and the goal is to cover a maximal number of these points with m disks. We introduce a parameter, ρ, as the maximum number of points that one disk can cover and we analyse the algorithms based on this parameter. At first, we solve the problem for m=1 in O(nρ) time, which improves the previous O(n2) time algorithm for this problem. Then we solve the problem for m=2 in O(nρ + 3 log ρ) time, which improves the previous O(n3 log n) algorithm for this problem. Our algorithms outperform the previous algorithms because ρ is much smaller than n in many cases. Finally, we extend the algorithm for any value of m and solve the problem in O(mnρ + (mρ)2m - 1 log mρ) time. The previous algorithm for this problem runs in O(n2m - 1 log n) time and our algorithm usually runs faster than the previous algorithm because mρ is smaller than n in many cases. We obtain output sensitive algorithms by confining the areas that we should search for the result. The techniques used in this paper may be applicable in other covering problems to obtain faster algorithms.

Download Full-text

A randomized algorithm for finding a maximum clique in the visibility graph of a simple polygon

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2126 ◽

2015 ◽

Vol Vol. 17 no. 1 (Discrete Algorithms) ◽

Author(s):

Sergio Cabello ◽

Maria Saumell

Keyword(s):

Hamiltonian Cycle ◽

Randomized Algorithm ◽

Maximum Clique ◽

Simple Polygon ◽

Maximum Size ◽

Visibility Graph ◽

Probability Of Error ◽

Discrete Algorithms ◽

International Audience ◽

Previous Algorithm

Discrete Algorithms International audience We present a randomized algorithm to compute a clique of maximum size in the visibility graph G of the vertices of a simple polygon P. The input of the problem consists of the visibility graph G, a Hamiltonian cycle describing the boundary of P, and a parameter δ∈(0,1) controlling the probability of error of the algorithm. The algorithm does not require the coordinates of the vertices of P. With probability at least 1-δ the algorithm runs in O( |E(G)|2 / ω(G) log(1/δ)) time and returns a maximum clique, where ω(G) is the number of vertices in a maximum clique in G. A deterministic variant of the algorithm takes O(|E(G)|2) time and always outputs a maximum size clique. This compares well to the best previous algorithm by Ghosh et al. (2007) for the problem, which is deterministic and runs in O(|V(G)|2 |E(G)|) time.

Download Full-text

Dynamic Generalized Suffix Arrays

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1398 ◽

2012 ◽

Vol 263-266 ◽

pp. 1398-1401

Author(s):

Song Feng Lu ◽

Hua Zhao

Keyword(s):

Data Structure ◽

Pattern Matching ◽

Time Complexity ◽

Document Retrieval ◽

Suffix Array ◽

Index Structure ◽

Suffix Arrays ◽

Basic Task ◽

Insertion And Deletion ◽

Dynamic Version

Document retrieval is the basic task of search engines, and seize amount of attention by the pattern matching community. In this paper, we focused on the dynamic version of this problem, in which the text insertion and deletion is allowable. By using the generalized suffix array and other data structure, we proposed a new index structure. Our scheme achieved better time complexity than the existing ones, and a bit more space overhead is needed as return.

Download Full-text

Designing efficient algorithms for querying large corpora

Oslo Studies in Language ◽

10.5617/osla.8504 ◽

2021 ◽

Vol 11 (2) ◽

pp. 283-302

Author(s):

Paul Meurer

Keyword(s):

Regular Expression ◽

Linear Time ◽

Suffix Array ◽

Efficient Algorithms ◽

Regular Expressions ◽

Efficient Treatment ◽

Suffix Arrays ◽

Regular Expression Matching ◽

Finite State ◽

Query System

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.

Download Full-text

Isomorphism of graph classes related to the circular-ones property

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.625 ◽

2013 ◽

Vol Vol. 15 no. 1 (Discrete Algorithms) ◽

Author(s):

Andrew R. Curtis ◽

Min Chih Lin ◽

Ross M. Mcconnell ◽

Yahav Nussbaum ◽

Francisco Juan Soulignac ◽

...

Keyword(s):

Linear Time ◽

Time Algorithm ◽

Linear Time Algorithm ◽

Circular Arc ◽

Graph Classes ◽

Discrete Algorithms ◽

International Audience ◽

Related Graph ◽

Circular Arc Graphs ◽

Proper Circular Arc Graphs

Discrete Algorithms International audience We give a linear-time algorithm that checks for isomorphism between two 0-1 matrices that obey the circular-ones property. Our algorithm is similar to the isomorphism algorithm for interval graphs of Lueker and Booth, but works on PC trees, which are unrooted and have a cyclic nature, rather than with PQ trees, which are rooted. This algorithm leads to linear-time isomorphism algorithms for related graph classes, including Helly circular-arc graphs, Γ circular-arc graphs, proper circular-arc graphs and convex-round graphs.

Download Full-text

An exact algorithm for the generalized list T-coloring problem

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2095 ◽

2014 ◽

Vol Vol. 16 no. 3 (Discrete Algorithms) ◽

Author(s):

Konstanty Junosza-Szaniawski ◽

Pawel Rzazewski

Keyword(s):

Graph Coloring ◽

Channel Assignment ◽

Maximum Degree ◽

Perfect Matching ◽

Exact Algorithm ◽

Special Structure ◽

Input Graph ◽

Discrete Algorithms ◽

International Audience ◽

The Difference

Discrete Algorithms International audience The generalized list T-coloring is a common generalization of many graph coloring models, including classical coloring, L(p,q)-labeling, channel assignment and T-coloring. Every vertex from the input graph has a list of permitted labels. Moreover, every edge has a set of forbidden differences. We ask for a labeling of vertices of the input graph with natural numbers, in which every vertex gets a label from its list of permitted labels and the difference of labels of the endpoints of each edge does not belong to the set of forbidden differences of this edge. In this paper we present an exact algorithm solving this problem, running in time O*((τ+2)n), where τ is the maximum forbidden difference over all edges of the input graph and n is the number of its vertices. Moreover, we show how to improve this bound if the input graph has some special structure, e.g. a bounded maximum degree, no big induced stars or a perfect matching.

Download Full-text

Blocks in Constrained Random Graphs with Fixed Average Degree

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2710 ◽

2009 ◽

Vol DMTCS Proceedings vol. AK,... (Proceedings) ◽

Author(s):

Konstantinos Panagiotou

Keyword(s):

Random Graph ◽

Random Graphs ◽

High Probability ◽

Average Degree ◽

Additional Restriction ◽

Structural Constraints ◽

Analytic Framework ◽

Discrete Algorithms ◽

International Audience ◽

Sharp Concentration

International audience This work is devoted to the study of typical properties of random graphs from classes with structural constraints, like for example planar graphs, with the additional restriction that the average degree is fixed. More precisely, within a general analytic framework, we provide sharp concentration results for the number of blocks (maximal biconnected subgraphs) in a random graph from the class in question. Among other results, we discover that essentially such a random graph belongs with high probability to only one of two possible types: it either has blocks of at most logarithmic size, or there is a \emphgiant block that contains linearly many vertices, and all other blocks are significantly smaller. This extends and generalizes the results in the previous work [K. Panagiotou and A. Steger. Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '09), pp. 432-440, 2009], where similar statements were shown without the restriction on the average degree.

Download Full-text

A linear time algorithm for finding an Euler walk in a strongly connected 3-uniform hypergraph

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.568 ◽

2012 ◽

Vol Vol. 14 no. 1 (Discrete Algorithms) ◽

Author(s):

Zbigniew Lonc ◽

Pawel Naroski

Keyword(s):

Linear Time ◽

Natural Extension ◽

Time Algorithm ◽

Linear Time Algorithm ◽

Uniform Hypergraph ◽

Graph Theoretic ◽

Uniform Hypergraphs ◽

Discrete Algorithms ◽

Strongly Connected ◽

International Audience

Discrete Algorithms International audience By an Euler walk in a 3-uniform hypergraph H we mean an alternating sequence v(0), epsilon(1), v(1), epsilon(2), v(2), ... , v(m-1), epsilon(m), v(m) of vertices and edges in H such that each edge of H appears in this sequence exactly once and v(i-1); v(i) is an element of epsilon(i), v(i-1) not equal v(i), for every i = 1, 2, ... , m. This concept is a natural extension of the graph theoretic notion of an Euler walk to the case of 3-uniform hypergraphs. We say that a 3-uniform hypergraph H is strongly connected if it has no isolated vertices and for each two edges e and f in H there is a sequence of edges starting with e and ending with f such that each two consecutive edges in this sequence have two vertices in common. In this paper we give an algorithm that constructs an Euler walk in a strongly connected 3-uniform hypergraph (it is known that such a walk in such a hypergraph always exists). The algorithm runs in time O(m), where m is the number of edges in the input hypergraph.

Download Full-text