Quantifying Substitutability

10.26686/wgtn.17009885.v1 ◽

2021 ◽

Author(s):

◽

David X. Wang

Keyword(s):

String Matching ◽

Evaluation Process ◽

Approximate String Matching ◽

Keyphrase Extraction ◽

Human Volunteers ◽

Matching Criteria ◽

The Cost ◽

Generic Design ◽

Matching Techniques ◽

Generic System

<p>In this thesis, we will tackle the problem of how keyphrase extraction systems can be evaluated to reveal their true efficacy. The aim is to develop a new semantically-oriented approximate string matching criteria, one that is comparable to human judgements, but without the cost and energy associated with manual evaluation. This matching criteria can also be adapted for any information retrieval (IR) system where the evaluation process involves comparing candidate strings (produced by the IR system) to a gold standard (created by humans). Our contributions are threefold. First, we define a new semantic relationship called substitutability – how suitable a phrase is when used in place of another – and then design a generic system which measures/quantifies this relationship by exploiting the interlinking structure of external knowledge sources. Second, we develop two concrete substitutability systems based on our generic design: WordSub, which is backed by WordNet; and WikiSub, which is backed by Wikipedia. Third, we construct a dataset, with the help of human volunteers, that isolates the task of measuring substitutability. This dataset is then used to evaluate the performance of our substitutability systems, along with existing approximate string matching techniques, by comparing them using a set of agreement metrics. Our results clearly demonstrate that WordSub and WikiSub comfortably outperform current approaches to approximate string matching, including both lexical-based methods, such as R-precision; and semantically-oriented techniques, such as METEOR. In fact, WikiSub’s performance comes sensibly close to that of an average human volunteer, when comparing it to the optimistic (best-case) interhuman agreement.</p>

Download Full-text

Approximate String Matching Techniques for Effective CLIR Among Indian Languages

Applications of Fuzzy Sets Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-73400-0_54 ◽

2007 ◽

pp. 430-437 ◽

Cited By ~ 9

Author(s):

Ranbeer Makin ◽

Nikita Pandey ◽

Prasad Pingali ◽

Vasudeva Varma

Keyword(s):

String Matching ◽

Approximate String Matching ◽

Indian Languages ◽

Matching Techniques

Download Full-text

Approximate Chinese String Matching Techniques Based on Pinyin Input Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1017 ◽

2014 ◽

Vol 513-517 ◽

pp. 1017-1020

Author(s):

Bing Liu ◽

Dan Han ◽

Shuang Zhang

Keyword(s):

Computer Science ◽

Rapid Development ◽

String Matching ◽

Approximate String Matching ◽

Chinese Characters ◽

Matching Problem ◽

Input Method ◽

Large Size ◽

Matching Techniques ◽

Research And Design

String matching is one of the most typical problems in computer science. Previous studies mainly focused on accurate string matching problem. However, with the rapid development of the computer and Internet as well as the continuously rising of new issues, people find that it has very important theoretical value and practical meaning to research and design efficient approximate string matching algorithms. Approximate string matching is also called string matching that allows errors, which mainly aims to find the pattern string in the text and database and allows k differences between the pattern string and its occurring forms in the text. For the problem of approximate string matching, though a number of algorithms have been proposed, there are fewer studies which focus on large size of alphabet . Most of experts are interested in small or middle size of alphabet . For large size of , especially for Chinese characters and Asian phonetics, there are fewer efficient algorithms. For the above reasons, this paper focuses on the approximate Chinese strings matching problem based on the pinyin input method.

Download Full-text

Optimizing the cost matrix for approximate string matching using genetic algorithms

Pattern Recognition ◽

10.1016/s0031-3203(97)00058-7 ◽

1998 ◽

Vol 31 (4) ◽

pp. 431-440 ◽

Cited By ~ 10

Author(s):

Marc Parizeau ◽

Nadia Ghazzali ◽

Jean-François Hébert

Keyword(s):

Genetic Algorithms ◽

String Matching ◽

Approximate String Matching ◽

Cost Matrix ◽

The Cost

Download Full-text

Association of Patient-Centered Medical Home designation and quality indicators within HRSA-funded community health center delivery sites

BMC Health Services Research ◽

10.1186/s12913-020-05826-x ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Nathaniel Bell ◽

Rebecca Wilkerson ◽

Kathy Mayfield-Smith ◽

Ana Lòpez-De Fede

Keyword(s):

Community Health ◽

Medical Home ◽

Clinical Performance ◽

String Matching ◽

Care Quality ◽

Approximate String Matching ◽

Patient Centered Medical Home ◽

Patient Centered ◽

Matching Algorithm ◽

Matching Techniques

Abstract Background Patient-Centered Medical Home (PCMH) adoption is an important strategy to help improve primary care quality within Health Resources and Service Administration (HRSA) community health centers (CHC), but evidence of its effect thus far remains mixed. A limitation of previous evaluations has been the inability to account for the proportion of CHC delivery sites that are designated medical homes. Methods Retrospective cross-sectional study using HRSA Uniform Data System (UDS) and certification files from the National Committee for Quality Assurance (NCQA) and the Joint Commission (JC). Datasets were linked through geocoding and an approximate string-matching algorithm. Predicted probability scores were regressed onto 11 clinical performance measures using 10% increments in site-level designation using beta logistic regression. Results The geocoding and approximate string-matching algorithm identified 2615 of the 6851 (41.8%) delivery sites included in the analyses as having been designated through the NCQA and/or JC. In total, 74.7% (n = 777) of the 1039 CHCs that met the inclusion criteria for the analysis managed at least one NCQA- and/or JC-designated site. A proportional increase in site-level designation showed a positive association with adherence scores for the majority of all indicators, but primarily among CHCs that designated at least 50% of its delivery sites. Once this threshold was achieved, there was a stepwise percentage point increase in adherence scores, ranging from 1.9 to 11.8% improvement, depending on the measure. Conclusion Geocoding and approximate string-matching techniques offer a more reliable and nuanced approach for monitoring the association between site-level PCMH designation and clinical performance within HRSA’s CHC delivery sites. Our findings suggest that transformation does in fact matter, but that it may not appear until half of the delivery sites become designated. There also appears to be a continued stepwise increase in adherence scores once this threshold is achieved.

Download Full-text