regular expression
Recently Published Documents


TOTAL DOCUMENTS

668
(FIVE YEARS 101)

H-INDEX

31
(FIVE YEARS 3)

Author(s):  
Rifiana Arief ◽  
Achmad Benny Mutiara ◽  
Tubagus Maulana Kusuma ◽  
Hustinawaty Hustinawaty

<p>This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from pusat data teknologi dan informasi (technology and information data center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.</p>


Algorithmica ◽  
2022 ◽  
Author(s):  
José Arturo Gil ◽  
Simone Santini

AbstractIn this paper we study regular expression matching in cases in which the identity of the symbols received is subject to uncertainty. We develop a model of symbol emission and uses a modification of the shortest path algorithm to find optimal matches on the Cartesian Graph of an expression provided that the input is a finite list. In the case of infinite streams, we show that the problem is in general undecidable but, if each symbols is received with probability 0 infinitely often, then with probability 1 the problem is decidable.


2022 ◽  
pp. 670-694
Author(s):  
Bartłomiej Dudek ◽  
Paweł Gawrychowski ◽  
Garance Gourdel ◽  
Tatiana Starikovskaya

2021 ◽  
Vol 24 (3) ◽  
Author(s):  
Elton Cardoso ◽  
Maycon Amaro ◽  
Samuel Feitosa ◽  
Leonardo Reis ◽  
André Du Bois ◽  
...  

We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.


2021 ◽  
Vol 68 (5) ◽  
pp. 1-43
Author(s):  
Michael Blondin ◽  
Matthias Englert ◽  
Alain Finkel ◽  
Stefan GÖller ◽  
Christoph Haase ◽  
...  

We prove that the reachability problem for two-dimensional vector addition systems with states is NL-complete or PSPACE-complete, depending on whether the numbers in the input are encoded in unary or binary. As a key underlying technical result, we show that, if a configuration is reachable, then there exists a witnessing path whose sequence of transitions is contained in a bounded language defined by a regular expression of pseudo-polynomially bounded length. This, in turn, enables us to prove that the lengths of minimal reachability witnesses are pseudo-polynomially bounded.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-24
Author(s):  
Daniele Parravicini ◽  
Davide Conficconi ◽  
Emanuele Del Sozzo ◽  
Christian Pilato ◽  
Marco D. Santambrogio

Regular Expression (RE) matching is a computational kernel used in several applications. Since RE complexity and data volumes are steadily increasing, hardware acceleration is gaining attention also for this problem. Existing approaches have limited flexibility as they require a different implementation for each RE. On the other hand, it is complex to map efficient RE representations like non-deterministic finite-state automata onto software-programmable engines or parallel architectures. In this work, we present CICERO , an end-to-end framework composed of a domain-specific architecture and a companion compilation framework for RE matching. Our solution is suitable for many applications, such as genomics/proteomics and natural language processing. CICERO aims at exploiting the intrinsic parallelism of non-deterministic representations of the REs. CICERO can trade-off accelerators’ efficiency and processors’ flexibility thanks to its programmable architecture and the compilation framework. We implemented CICERO prototypes on embedded FPGA achieving up to 28.6× and 20.8× more energy efficiency than embedded and mainstream processors, respectively. Since it is a programmable architecture, it can be implemented as a custom ASIC that is orders of magnitude more energy-efficient than mainstream processors.


Sign in / Sign up

Export Citation Format

Share Document