String pattern searching algorithm based on characters indices

Author(s):  
Ivan Markic ◽  
Maja Stula ◽  
Marija Zoric
Keyword(s):  
1977 ◽  
Vol 12 (6) ◽  
pp. 144-152 ◽  
Author(s):  
R. J.W. Housden ◽  
N. Kotarski

2012 ◽  
Vol 25 (1) ◽  
pp. 45-50 ◽  
Author(s):  
M. Baena-Garcı´a ◽  
R. Morales-Bueno

2022 ◽  
Vol 12 (1) ◽  
pp. 1-18
Author(s):  
Umamageswari Kumaresan ◽  
Kalpana Ramanujam

The intent of this research is to come up with an automated web scraping system which is capable of extracting structured data records embedded in semi-structured web pages. Most of the automated extraction techniques in the literature captures repeated pattern among a set of similarly structured web pages, thereby deducing the template used for the generation of those web pages and then data records extraction is done. All of these techniques exploit computationally intensive operations such as string pattern matching or DOM tree matching and then perform manual labeling of extracted data records. The technique discussed in this paper departs from the state-of-the-art approaches by determining informative sections in the web page through repetition of informative content rather than syntactic structure. From the experiments, it is clear that the system has identified data rich region with 100% precision for web sites belonging to different domains. The experiments conducted on the real world web sites prove the effectiveness and versatility of the proposed approach.


2010 ◽  
Vol 7 (2) ◽  
pp. 331-357 ◽  
Author(s):  
Tomás Flouri ◽  
Jan Janousek ◽  
Bořivoj Melichar

Subtree matching is an important problem in Computer Science on which a number of tasks, such as mechanical theorem proving, term-rewriting, symbolic computation and nonprocedural programming languages are based on. A systematic approach to the construction of subtree pattern matchers by deterministic pushdown automata, which read subject trees in prefix and postfix notation, is presented. The method is analogous to the construction of string pattern matchers: for a given pattern, a nondeterministic pushdown automaton is created and is then determinised. In addition, it is shown that the size of the resulting deterministic pushdown automata directly corresponds to the size of the existing string pattern matchers based on finite automata.


Sign in / Sign up

Export Citation Format

Share Document