regular expressions
Recently Published Documents


TOTAL DOCUMENTS

898
(FIVE YEARS 184)

H-INDEX

35
(FIVE YEARS 4)

2022 ◽  
Vol 109 ◽  
pp. 124-143
Author(s):  
Stefan Kahrs ◽  
Colin Runciman
Keyword(s):  

2022 ◽  
Author(s):  
Stephanie Hu ◽  
Steven Horng ◽  
Seth J. Berkowitz ◽  
Ruizhi Liao ◽  
Rahul G. Krishnan ◽  
...  

Accurately assessing the severity of pulmonary edema is critical for making treatment decisions in congestive heart failure patients. However, the current scale for quantifying pulmonary edema based on chest radiographs does not have well-characterized severity levels, with substantial inter-radiologist disagreement. In this study, we investigate whether comparisons documented in radiology reports can provide accurate characterizations of pulmonary edema progression. We propose a rules-based natural language processing approach to assess the change in a patient's pulmonary edema status (e.g. better, worse, no change) by performing pairwise comparisons of consecutive radiology reports, using regular expressions and heuristics derived from clinical knowledge. Evaluated against ground-truth labels from expert radiologists, our labeler extracts comparisons describing the progression of pulmonary edema with 0.875 precision and 0.891 recall. We also demonstrate the potential utility of comparison labels in providing additional fine-grained information over noisier labels produced by models that directly estimate severity level.


2022 ◽  
Vol 6 (POPL) ◽  
pp. 1-31
Author(s):  
Taolue Chen ◽  
Alejandro Flores-Lamas ◽  
Matthew Hague ◽  
Zhilei Han ◽  
Denghang Hu ◽  
...  

Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features such as capturing groups and references. While symbolic execution of programs containing RegExes appeals to string solvers natively supporting important features of RegEx, such a string solver is hitherto missing. In this paper, we propose the first string theory and string solver that natively provides such support. The key idea of our string solver is to introduce a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions. PSSTs combine priorities, which have previously been introduced in prioritized finite-state automata to capture greedy/lazy semantics, with string variables as in streaming string transducers to model capturing groups. We validate the consistency of the formal semantics with the actual JavaScript semantics by extensive experiments. Furthermore, to solve the string constraints, we show that PSSTs enjoy nice closure and algorithmic properties, in particular, the regularity-preserving property (i.e., pre-images of regular constraints under PSSTs are regular), and introduce a sound sequent calculus that exploits these properties and performs propagation of regular constraints by means of taking post-images or pre-images. Although the satisfiability of the string constraint language is generally undecidable, we show that our approach is complete for the so-called straight-line fragment. We evaluate the performance of our string solver on over 195000 string constraints generated from an open-source RegEx library. The experimental results show the efficacy of our approach, drastically improving the existing methods (via symbolic execution) in both precision and efficiency.


Algorithmica ◽  
2022 ◽  
Author(s):  
José Arturo Gil ◽  
Simone Santini

AbstractIn this paper we study regular expression matching in cases in which the identity of the symbols received is subject to uncertainty. We develop a model of symbol emission and uses a modification of the shortest path algorithm to find optimal matches on the Cartesian Graph of an expression provided that the input is a finite list. In the case of infinite streams, we show that the problem is in general undecidable but, if each symbols is received with probability 0 infinitely often, then with probability 1 the problem is decidable.


2021 ◽  
Vol 24 (3) ◽  
Author(s):  
Elton Cardoso ◽  
Maycon Amaro ◽  
Samuel Feitosa ◽  
Leonardo Reis ◽  
André Du Bois ◽  
...  

We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.


2021 ◽  
Author(s):  
Jackson Woodruff ◽  
Michael F.P. O'Boyle
Keyword(s):  

2021 ◽  
Vol 4 ◽  
pp. 1-4
Author(s):  
Mátyás Gede ◽  
Lola Varga

Abstract. The authors developed a pipeline for the automatic georeferencing of older 1 : 25 000 topographic map sheets of Hungary. The first step is the detection of the corners of the map content, then the recognition of the sheet identifier. These maps depict geographic quadrangles whose extent can be derived from the sheet ID. The sheet corners are used as GCPs for the georeference.The whole process is implemented in Python, using various open source libraries: OpenCV for image processing, Tesseract for OCR and GDAL for georeferencing.1147 map sheets were processed with an average speed of 4 seconds per sheet. False detection of the corners is automatically filtered by geometric analysis of the detected GCPs, while the sheet IDs are validated using regular expressions. The error of corner detection is under 1% of the sheet size for 89% of the sheets, under 2% for 99%. The sheet ID recognition success rate is 75.9%.Although the system is finetuned to a specific map series, it can be easily adapted to any other map series having approximately rectangular frame.


2021 ◽  
Author(s):  
Cleyton M. O. Rodrigues ◽  
Bruno J. T. Fernandes ◽  
Leandro H. S. Silva ◽  
David J. Barrientos ◽  
Allana L. S. Rocha ◽  
...  

Electronic Legal Proceedings are a worldwide legal phenomena, allowing the use of computerized systems for the creation and monitoring of procedural acts in the most diverse legal bodies. On one hand, it allows greater transparency in the conduct of procedural acts, on the other, it has contributed to the bottleneck of open but unresolved lawsuits each year. Nowadays, Information Retrieval to automate the processing of these procedural objects is at the forefront of computer systems for Law. In this study, we present MISLA2, a system to retrieve orders and preliminaries from judicial labour sentences through ontological models built from previous cases. Instead of tied and difficult-to-maintain domain specification models, we demonstrate how light ontologies, in conjunction with regular expressions for extracting significant portions of the text, can achieve the desired results. In addition, empirical experiments carried out with real labour lawsuits evidence that results are quite promising.


Sign in / Sign up

Export Citation Format

Share Document