regular expression Latest Research Papers

Automated hierarchical classification of scanned documents using convolutional neural network and regular expression

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i1.pp1018-1029 ◽

2022 ◽

Vol 12 (1) ◽

pp. 1018

Author(s):

Rifiana Arief ◽

Achmad Benny Mutiara ◽

Tubagus Maulana Kusuma ◽

Hustinawaty Hustinawaty

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

Regular Expression ◽

Hierarchical Classification ◽

Document Vector ◽

Classification Prediction ◽

Scanned Documents

<p>This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from pusat data teknologi dan informasi (technology and information data center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.</p>

Matching Regular Expressions on uncertain data

Algorithmica ◽

10.1007/s00453-021-00906-8 ◽

2022 ◽

Author(s):

José Arturo Gil ◽

Simone Santini

Keyword(s):

Shortest Path ◽

Regular Expression ◽

Uncertain Data ◽

Regular Expressions ◽

Shortest Path Algorithm ◽

Regular Expression Matching

AbstractIn this paper we study regular expression matching in cases in which the identity of the symbols received is subject to uncertainty. We develop a model of symbol emission and uses a modification of the shortest path algorithm to find optimal matches on the Cartesian Graph of an expression provided that the input is a finite list. In the case of infinite streams, we show that the problem is in general undecidable but, if each symbols is received with probability 0 infinitely often, then with probability 1 the problem is decidable.

Streaming Regular Expression Membership and Pattern Matching

10.1137/1.9781611977073.30 ◽

2022 ◽

pp. 670-694

Author(s):

Bartłomiej Dudek ◽

Paweł Gawrychowski ◽

Garance Gourdel ◽

Tatiana Starikovskaya

Keyword(s):

Pattern Matching ◽

Regular Expression

The Design of a Verified Derivative-Based Parsing Tool for Regular Expressions

CLEI electronic journal ◽

10.19153/cleiej.24.3.2 ◽

2021 ◽

Vol 24 (3) ◽

Author(s):

Elton Cardoso ◽

Maycon Amaro ◽

Samuel Feitosa ◽

Leonardo Reis ◽

André Du Bois ◽

...

Keyword(s):

Regular Expression ◽

Input String ◽

Regular Expressions

We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.

Reinhardt: Real-time Reconfigurable Hardware Architecture for Regular Expression Matching in DPI

10.1145/3485832.3485878 ◽

2021 ◽

Author(s):

Taejune Park ◽

Jaehyun Nam ◽

Seung Ho Na ◽

Jaewoong Chung ◽

Seungwon Shin

Keyword(s):

Real Time ◽

Regular Expression ◽

Reconfigurable Hardware ◽

Hardware Architecture ◽

Regular Expression Matching ◽

Reconfigurable Hardware Architecture

NFA Based Regular Expression Matching on FPGA

10.1109/cits52676.2021.9618426 ◽

2021 ◽

Author(s):

Kamil Sert ◽

Cuneyt F. Bazlamacci

Keyword(s):

Regular Expression ◽

Regular Expression Matching

Arext: Automatic Regular Expression Testing Tool Based on Generating Strings With Full Coverage

10.1109/kse53942.2021.9648604 ◽

2021 ◽

Author(s):

Nguyen Van Hoan ◽

Pham Ngoc Hung

Keyword(s):

Regular Expression ◽

Full Coverage ◽

Testing Tool

Regular expression length via arithmetic formula complexity

Journal of Computer and System Sciences ◽

10.1016/j.jcss.2021.10.004 ◽

2021 ◽

Author(s):

Ehud Cseresnyes ◽

Hannes Seiwert

Keyword(s):

Regular Expression ◽

Formula Complexity

The Reachability Problem for Two-Dimensional Vector Addition Systems with States

Journal of the ACM ◽

10.1145/3464794 ◽

2021 ◽

Vol 68 (5) ◽

pp. 1-43

Author(s):

Michael Blondin ◽

Matthias Englert ◽

Alain Finkel ◽

Stefan GÖller ◽

Christoph Haase ◽

...

Keyword(s):

Regular Expression ◽

Two Dimensional ◽

Dimensional Vector ◽

Reachability Problem ◽

Technical Result ◽

Vector Addition ◽

Bounded Language

We prove that the reachability problem for two-dimensional vector addition systems with states is NL-complete or PSPACE-complete, depending on whether the numbers in the input are encoded in unary or binary. As a key underlying technical result, we show that, if a configuration is reachable, then there exists a witnessing path whose sequence of transitions is contained in a bounded language defined by a regular expression of pseudo-polynomially bounded length. This, in turn, enables us to prove that the lengths of minimal reachability witnesses are pseudo-polynomially bounded.

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

ACM Transactions on Embedded Computing Systems ◽

10.1145/3476982 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-24

Author(s):

Daniele Parravicini ◽

Davide Conficconi ◽

Emanuele Del Sozzo ◽

Christian Pilato ◽

Marco D. Santambrogio

Keyword(s):

Language Processing ◽

Regular Expression ◽

Hardware Acceleration ◽

Domain Specific ◽

Programmable Architecture ◽

Regular Expression Matching ◽

Intrinsic Parallelism ◽

Finite State ◽

Computational Kernel ◽

Compilation Framework

Regular Expression (RE) matching is a computational kernel used in several applications. Since RE complexity and data volumes are steadily increasing, hardware acceleration is gaining attention also for this problem. Existing approaches have limited flexibility as they require a different implementation for each RE. On the other hand, it is complex to map efficient RE representations like non-deterministic finite-state automata onto software-programmable engines or parallel architectures. In this work, we present CICERO , an end-to-end framework composed of a domain-specific architecture and a companion compilation framework for RE matching. Our solution is suitable for many applications, such as genomics/proteomics and natural language processing. CICERO aims at exploiting the intrinsic parallelism of non-deterministic representations of the REs. CICERO can trade-off accelerators’ efficiency and processors’ flexibility thanks to its programmable architecture and the compilation framework. We implemented CICERO prototypes on embedded FPGA achieving up to 28.6× and 20.8× more energy efficiency than embedded and mainstream processors, respectively. Since it is a programmable architecture, it can be implemented as a custom ASIC that is orders of magnitude more energy-efficient than mainstream processors.

regular expression
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automated hierarchical classification of scanned documents using convolutional neural network and regular expression

Matching Regular Expressions on uncertain data

Streaming Regular Expression Membership and Pattern Matching

The Design of a Verified Derivative-Based Parsing Tool for Regular Expressions

Reinhardt: Real-time Reconfigurable Hardware Architecture for Regular Expression Matching in DPI

NFA Based Regular Expression Matching on FPGA

Arext: Automatic Regular Expression Testing Tool Based on Generating Strings With Full Coverage

Regular expression length via arithmetic formula complexity

The Reachability Problem for Two-Dimensional Vector Addition Systems with States

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

Export Citation Format

regular expressionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automated hierarchical classification of scanned documents using convolutional neural network and regular expression

Matching Regular Expressions on uncertain data

Streaming Regular Expression Membership and Pattern Matching

The Design of a Verified Derivative-Based Parsing Tool for Regular Expressions

Reinhardt: Real-time Reconfigurable Hardware Architecture for Regular Expression Matching in DPI

NFA Based Regular Expression Matching on FPGA

Arext: Automatic Regular Expression Testing Tool Based on Generating Strings With Full Coverage

Regular expression length via arithmetic formula complexity

The Reachability Problem for Two-Dimensional Vector Addition Systems with States

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

regular expression
Recently Published Documents