scholarly journals Text Indexing for Regular Expression Matching

Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 133
Author(s):  
Daniel Gibney ◽  
Sharma V. Thankachan

Finding substrings of a text T that match a regular expression p is a fundamental problem. Despite being the subject of extensive research, no solution with a time complexity significantly better than O(|T||p|) has been found. Backurs and Indyk in FOCS 2016 established conditional lower bounds for the algorithmic problem based on the Strong Exponential Time Hypothesis that helps explain this difficulty. A natural question is whether we can improve the time complexity for matching the regular expression by preprocessing the text T? We show that conditioned on the Online Matrix–Vector Multiplication (OMv) conjecture, even with arbitrary polynomial preprocessing time, a regular expression query on a text cannot be answered in strongly sublinear time, i.e., O(|T|1−ε) for any ε>0. Furthermore, if we extend the OMv conjecture to a plausible conjecture regarding Boolean matrix multiplication with polynomial preprocessing time, which we call Online Matrix–Matrix Multiplication (OMM), we can strengthen this hardness result to there being no solution with a query time that is O(|T|3/2−ε). These results hold for alphabet sizes three or greater. We then provide data structures that answer queries in O(|T||p|τ) time where τ∈[1,|T|] is fixed at construction. These include a solution that works for all regular expressions with Expτ·|T| preprocessing time and space. For patterns containing only ‘concatenation’ and ‘or’ operators (the same type used in the hardness result), we provide (1) a deterministic solution which requires Expτ·|T|log2|T| preprocessing time and space, and (2) when |p|≤|T|z for z=2o(log|T|), a randomized solution with amortized query time which answers queries correctly with high probability, requiring Expτ·|T|2Ωlog|T| preprocessing time and space.

Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 347
Author(s):  
Anne Berry ◽  
Geneviève Simonet

The atom graph of a graph is a graph whose vertices are the atoms obtained by clique minimal separator decomposition of this graph, and whose edges are the edges of all possible atom trees of this graph. We provide two efficient algorithms for computing this atom graph, with a complexity in O(min(nωlogn,nm,n(n+m¯)) time, where n is the number of vertices of G, m is the number of its edges, m¯ is the number of edges of the complement of G, and ω, also denoted by α in the literature, is a real number, such that O(nω) is the best known time complexity for matrix multiplication, whose current value is 2,3728596. This time complexity is no more than the time complexity of computing the atoms in the general case. We extend our results to α-acyclic hypergraphs, which are hypergraphs having at least one join tree, a join tree of an hypergraph being defined by its hyperedges in the same way as an atom tree of a graph is defined by its atoms. We introduce the notion of union join graph, which is the union of all possible join trees; we apply our algorithms for atom graphs to efficiently compute union join graphs.


Mathematics ◽  
2019 ◽  
Vol 7 (9) ◽  
pp. 805 ◽  
Author(s):  
Monther Rashed Alfuraidan ◽  
Ibrahim Nabeel Joudah

In this work, we obtain a new formula for Fibonacci’s family m-step sequences. We use our formula to find the nth term with less time complexity than the matrix multiplication method. Then, we extend our results for all linear homogeneous recurrence m-step relations with constant coefficients by using the last few terms of its corresponding Fibonacci’s family m-step sequence. As a computational number theory application, we develop a method to estimate the square roots.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0251297
Author(s):  
Pinaki Bhattacharya ◽  
Qiao Li ◽  
Damien Lacroix ◽  
Visakan Kadirkamanathan ◽  
Marco Viceconti

Throughout engineering there are problems where it is required to predict a quantity based on the measurement of another, but where the two quantities possess characteristic variations over vastly different ranges of time and space. Among the many challenges posed by such ‘multiscale’ problems, that of defining a ‘scale’ remains poorly addressed. This fundamental problem has led to much confusion in the field of biomedical engineering in particular. The present study proposes a definition of scale based on measurement limitations of existing instruments, available computational power, and on the ranges of time and space over which quantities of interest vary characteristically. The definition is used to construct a multiscale modelling methodology from start to finish, beginning with a description of the system (portion of reality of interest) and ending with an algorithmic orchestration of mathematical models at different scales within the system. The methodology is illustrated for a specific but well-researched problem. The concept of scale and the multiscale modelling approach introduced are shown to be easily adaptable to other closely related problems. Although out of the scope of this paper, we believe that the proposed methodology can be applied widely throughout engineering.


T-Comm ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 4-10
Author(s):  
Vitaly B. Kreyndelin ◽  
◽  
Elena D. Grigorieva ◽  

Algorithms of implementation of vector-matrix multiplication are presented, which are intended for application in banks (sets) of digital filters. These algorithms provide significant savings in computational costs over traditional algorithms. At the same time, reduction of computational complexity of algorithms is achieved without any performance loss of banks (sets) of digital filters. As the basis for the construction of algorithms proposed in the article, the previously known Winograd method of multiplication of real matrices and vectors and two versions of the method of type 3M for multiplication of complex matrices and vectors are used. Methods of combining these known methods of multiplying matrices and vectors for building digital filter banks (sets) are considered. The analysis of computing complexity of such ways which showed a possibility of reduction of computing complexity in comparison with a traditional algorithm of realization of bank (set) of digital filters approximately in 2.66 times – at realization on the processor without hardware multiplier is carried out; and by 1.33 times – at realization on the processor with the hardware multiplier. These indicators are markedly higher than those of known algorithms. Analysis of sensitivity of algorithms proposed in this article to rounding errors arising by digital signal processing was carried out. Based on this analysis, an algorithm is selected that has a computational complexity smaller than that of a traditional algorithm, but its sensitivity to rounding errors is the same as that of a traditional algorithm. Recommendations are given on its practical application in the development of a bank (set) of digital filters.


2016 ◽  
Author(s):  
Dogan Corus ◽  
Duc-Cuong Dang ◽  
Anton V. Eremeev ◽  
Per Kristian Lehre

AbstractUnderstanding how the time-complexity of evolutionary algorithms (EAs) depend on their parameter settings and characteristics of fitness landscapes is a fundamental problem in evolutionary computation. Most rigorous results were derived using a handful of key analytic techniques, including drift analysis. However, since few of these techniques apply effortlessly to population-based EAs, most time-complexity results concern simplified EAs, such as the (1 + 1) EA.This paper describes the level-based theorem, a new technique tailored to population-based processes. It applies to any non-elitist process where o spring are sampled independently from a distribution depending only on the current population. Given conditions on this distribution, our technique provides upper bounds on the expected time until the process reaches a target state.We demonstrate the technique on several pseudo-Boolean functions, the sorting problem, and approximation of optimal solutions in combina-torial optimisation. The conditions of the theorem are often straightfor-ward to verify, even for Genetic Algorithms and Estimation of Distribution Algorithms which were considered highly non-trivial to analyse. Finally, we prove that the theorem is nearly optimal for the processes considered. Given the information the theorem requires about the process, a much tighter bound cannot be proved.


1997 ◽  
Vol 08 (04) ◽  
pp. 443-467 ◽  
Author(s):  
Glenn K. Manacher ◽  
Terrance A. Mankus

A maximum clique is sought in a set of n proper circular arcs (PCAS). By means of several passes, each O(n) in time and space, a PCAS is transformed initially into a set of circle chords and finally into a set of intervals. This interval model inherits a special property from the PCAS which ensures the discovery of a maximum overlap clique in time O(n). The one-to-one arc/interval correspondence guarantees the identification of the maximum clique in the PCAS in O(n) time and space. The present paper gives new, simpler proofs for the lemmas first outlined by us in Ref. [9], extending the methods outlined in that paper so that the time bound is improved from O(n log n) to O(n). The method depends only on certain interconnections between constructions related to the computation of longest increasing subsequences. Independently, Hell, Huang and Bhattacharya5 recently discovered a completely different approach that also achieves the same complexity, and can moreover be applied to the weighted case and to the coloring problem on proper circular arcs. The previous best result, due to Apostolico and Hambrusch2 applies to general circular arc models and has time complexity O(n2 log log n) and space complexity O(n). As applications of the method, we show that maximum weight clique of a set of weighted proper circular arcs can be found in time O(n2) and space O(n). The previous best result was O(n2 log log n) for dense general circular arc graphs.13 We also show that, for n chords with randomly placed endpoints (1) the average cardinality of a maximum clique is cn1/2 ± o(n1/2), where 21/2< c < e21/2, and (2) a maximum clique may be found in average time O(n3/2) and space θ(n). The previous best average time complexity, derived from Ref. [1], was O(n3/2 log n).


2001 ◽  
Vol 11 (06) ◽  
pp. 707-735 ◽  
Author(s):  
J.-M. CHAMPARNAUD ◽  
D. ZIADI

Two classical non-deterministic automata recognize the language denoted by a regular expression: the position automaton which deduces from the position sets defined by Glushkov and McNaughton–Yamada, and the equation automaton which can be computed via Mirkin's prebases or Antimirov's partial derivatives. Let |E| be the size of the expression and ‖E‖ be its alphabetic width, i.e. the number of symbol occurrences. The number of states in the equation automaton is less than or equal to the number of states in the position automaton, which is equal to ‖E‖+1. On the other hand, the worst-case time complexity of Antimirov algorithm is O(‖E‖3· |E|2), while it is only O(‖E‖·|E|) for the most efficient implementations yielding the position automaton (Brüggemann–Klein, Chang and Paige, Champarnaud et al.). We present an O(|E|2) space and time algorithm to compute the equation automaton. It is based on the notion of canonical derivative which makes it possible to efficiently handle sets of word derivatives. By the way, canonical derivatives also lead to a new O(|E|2) space and time algorithm to construct the position automaton.


1978 ◽  
Vol 11 (1) ◽  
Author(s):  
Leonard Adleman ◽  
KelloggS. Booth ◽  
FrancoP. Preparata ◽  
WalterL. Ruzzo

2007 ◽  
Vol DMTCS Proceedings vol. AH,... (Proceedings) ◽  
Author(s):  
Maxime Crochemore ◽  
Costas S. Iliopoulos ◽  
M. Sohel Rahman

International audience In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied by Mäkinen and Navarro [Position-Restricted Substring Searching, LATIN 2006]. In the problem handled in this paper, we are interested in those occurrences of the pattern that lies in a suffix or in a prefix of the given text. We achieve optimal query time for our problem against a data structure which is an extension of the classic suffix tree data structure. The time and space complexity of the data structure is dominated by that of the suffix tree. Notably, the (best) algorithm by Mäkinen and Navarro, if applied to our problem, gives sub-optimal query time and the corresponding data structure also requires more time and space.


2013 ◽  
Vol 347-350 ◽  
pp. 3094-3098 ◽  
Author(s):  
Jian Li

This paper puts forward an improved dynamic programming algorithm for bitonic TSP and it proves to be correct. Divide the whole loop into right-and-left parts through analyzing the key point connecting to the last one directly; then construct a new optimal sub-structure and recursion. The time complexity of the new algorithm is O(n2) and the space complexity is O(n); while both the time and space complexities of the classical algorithm are O(n2). Experiment results showed that the new algorithm not only reduces the space requirement greatly but also increases the computing speed by 2-3 times compared with the classical algorithm.


Sign in / Sign up

Export Citation Format

Share Document