Pattern Matching and Regular Expressions

1997 ◽  
pp. 44-48
Author(s):  
Dexter C. Kozen
2017 ◽  
Vol 110 ◽  
pp. 250-257 ◽  
Author(s):  
Yeim-Kuan Chang ◽  
Ching-Hsuan Shih

2019 ◽  
Vol 20 (4) ◽  
pp. 1289-1302 ◽  
Author(s):  
Hyo-Sang Shin ◽  
Dario Turchi ◽  
Shaoming He ◽  
Antonios Tsourdos

2006 ◽  
Vol 16 (6) ◽  
pp. 711-750 ◽  
Author(s):  
HARUO HOSAYA

XML data are described by types involving regular expressions. This raises the question of what language feature is convenient for manipulating such data. Previously, we have given an answer to this question by proposing regular expression pattern matching. However, since this construct is derived from ML pattern matching, it does not have an iteration functionality in itself, which makes it cumbersome to process data typed by Kleene stars. In this paper, we propose a novel programming feature regular expression filters. This construct extends the previous proposal by permitting pattern clauses to be closed under arbitrary regular expression operators. This yields many convenient programming idioms such as non-uniform processing of sequences and almost-copying of trees. We further develop a type inference mechanism that obtains (1) types for pattern variables that are locally precise with respect to the type of input values and (2) a type for the result of the whole filter expression that is also locally precise with respect to the types of the body expressions. We discuss how our construct is useful in the practice of XML processing and, in particular, how our type inference is crucial for avoiding changes of programs when types of data to be processed evolve frequently.


2018 ◽  
Vol 51 (4) ◽  
pp. 1007-1021
Author(s):  
Jason W. Karl

Location information in published studies represents an untapped resource for literature discovery, applicable to a range of domains. The ability to easily discover scientific articles from specific places, nearby locales, or similar (but geographically separate) areas worldwide is important for advancing science and addressing global sustainability challenges. However, the thematic and not geographic nature of current search tools makes location-based searches challenging and inefficient. Manually geolocating studies is labor intensive, and place-name recognition algorithms have performed poorly due to prevalence of irrelevant place names in scientific articles. These challenges have hindered past efforts to create map-based literature search tools. Thus, automated approaches are needed to sustain article georeferencing efforts. Common pattern-matching algorithms (parsers) can be used to identify and extract geographic coordinates from the text of published articles. Pattern-matching algorithms (geoparsers) were developed using regular expressions and lexical parsing and tested their performance against sets of full-text articles from multiple journals that were manually scanned for coordinates. Both geoparsers performed well at recognizing and extracting coordinates from articles with accuracy ranging from 85.1% to 100%, and the lexical geoparser performing marginally better. Omission errors (i.e. missed coordinates) were 0% to 14.9% for the regular expression geoparser and 0% to 10.3% for the lexical geoparser. Only a single commission error (i.e. erroneous coordinate) was encountered with the lexical geoparser. The ability to automatically identify and extract location information from published studies opens new possibilities for transforming scientific literature discovery and supporting novel research.


2012 ◽  
Vol E95.D (7) ◽  
pp. 1847-1857 ◽  
Author(s):  
Yusaku KANETA ◽  
Shingo YOSHIZAWA ◽  
Shin-ichi MINATO ◽  
Hiroki ARIMURA ◽  
Yoshikazu MIYANAGA

Author(s):  
G.F. HARISH REDDY ◽  
S. INDIRA PRIYADARSINI ◽  
J. PRATIBHA

Advances in computer networks and storage subsystems continue to push the rate at which data streams must be processed between and within computer systems. Meanwhile, the content of such data streams is subjected to ever increasing scrutiny, as components at all levels mine the streams for patterns that can trigger time-sensitive action. The problem of discovering credit card numbers, currency values, or telephone numbers requires a more general specification mechanism. While there is a well developed theory for regular expressions and their implementation via Finite-State Machines (FSMs), the use of regular expressions for high-performance pattern matching is more difficult and is an area of ongoing research. In this Paper, a memory-efficient pattern matching algorithm which can significantly reduce the number of states and transitions by merging pseudo-equivalent states while maintaining correctness of string matching. In addition, the new algorithm is complementary to other memory reduction approaches and provides further reductions in memory needs.


Sign in / Sign up

Export Citation Format

Share Document