regular expression matching
Recently Published Documents


TOTAL DOCUMENTS

236
(FIVE YEARS 17)

H-INDEX

20
(FIVE YEARS 1)

Algorithmica ◽  
2022 ◽  
Author(s):  
José Arturo Gil ◽  
Simone Santini

AbstractIn this paper we study regular expression matching in cases in which the identity of the symbols received is subject to uncertainty. We develop a model of symbol emission and uses a modification of the shortest path algorithm to find optimal matches on the Cartesian Graph of an expression provided that the input is a finite list. In the case of infinite streams, we show that the problem is in general undecidable but, if each symbols is received with probability 0 infinitely often, then with probability 1 the problem is decidable.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-24
Author(s):  
Daniele Parravicini ◽  
Davide Conficconi ◽  
Emanuele Del Sozzo ◽  
Christian Pilato ◽  
Marco D. Santambrogio

Regular Expression (RE) matching is a computational kernel used in several applications. Since RE complexity and data volumes are steadily increasing, hardware acceleration is gaining attention also for this problem. Existing approaches have limited flexibility as they require a different implementation for each RE. On the other hand, it is complex to map efficient RE representations like non-deterministic finite-state automata onto software-programmable engines or parallel architectures. In this work, we present CICERO , an end-to-end framework composed of a domain-specific architecture and a companion compilation framework for RE matching. Our solution is suitable for many applications, such as genomics/proteomics and natural language processing. CICERO aims at exploiting the intrinsic parallelism of non-deterministic representations of the REs. CICERO can trade-off accelerators’ efficiency and processors’ flexibility thanks to its programmable architecture and the compilation framework. We implemented CICERO prototypes on embedded FPGA achieving up to 28.6× and 20.8× more energy efficiency than embedded and mainstream processors, respectively. Since it is a programmable architecture, it can be implemented as a custom ASIC that is orders of magnitude more energy-efficient than mainstream processors.


2021 ◽  
Author(s):  
Nan Jiang ◽  
Ping Lin ◽  
Yulong He ◽  
Zhuozhi Tan ◽  
Jin Hu

2021 ◽  
Vol 11 (2) ◽  
pp. 283-302
Author(s):  
Paul Meurer

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.


2021 ◽  
Vol 31 ◽  
Author(s):  
ANDRZEJ FILINSKI

Abstract We show how to systematically derive an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory. Starting from the standard specification of the set-theoretic semantics of regular expressions, we proceed via a continuation-based backtracking matcher, to a classical, table-driven state machine. All steps of the development are supported by self-contained (and machine-verified) equational correctness proofs.


2020 ◽  
Vol 168 ◽  
pp. 106996
Author(s):  
Xiuwen Sun ◽  
Hao Li ◽  
Dan Zhao ◽  
Xingxing Lu ◽  
Zheng Peng ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document