Optimizing the cost matrix for approximate string matching using genetic algorithms

1998 ◽  
Vol 31 (4) ◽  
pp. 431-440 ◽  
Author(s):  
Marc Parizeau ◽  
Nadia Ghazzali ◽  
Jean-François Hébert
2021 ◽  
Author(s):  
◽  
David X. Wang

<p>In this thesis, we will tackle the problem of how keyphrase extraction systems can be evaluated to reveal their true efficacy. The aim is to develop a new semantically-oriented approximate string matching criteria, one that is comparable to human judgements, but without the cost and energy associated with manual evaluation. This matching criteria can also be adapted for any information retrieval (IR) system where the evaluation process involves comparing candidate strings (produced by the IR system) to a gold standard (created by humans). Our contributions are threefold. First, we define a new semantic relationship called substitutability – how suitable a phrase is when used in place of another – and then design a generic system which measures/quantifies this relationship by exploiting the interlinking structure of external knowledge sources. Second, we develop two concrete substitutability systems based on our generic design: WordSub, which is backed by WordNet; and WikiSub, which is backed by Wikipedia. Third, we construct a dataset, with the help of human volunteers, that isolates the task of measuring substitutability. This dataset is then used to evaluate the performance of our substitutability systems, along with existing approximate string matching techniques, by comparing them using a set of agreement metrics. Our results clearly demonstrate that WordSub and WikiSub comfortably outperform current approaches to approximate string matching, including both lexical-based methods, such as R-precision; and semantically-oriented techniques, such as METEOR. In fact, WikiSub’s performance comes sensibly close to that of an average human volunteer, when comparing it to the optimistic (best-case) interhuman agreement.</p>


2021 ◽  
Author(s):  
◽  
David X. Wang

<p>In this thesis, we will tackle the problem of how keyphrase extraction systems can be evaluated to reveal their true efficacy. The aim is to develop a new semantically-oriented approximate string matching criteria, one that is comparable to human judgements, but without the cost and energy associated with manual evaluation. This matching criteria can also be adapted for any information retrieval (IR) system where the evaluation process involves comparing candidate strings (produced by the IR system) to a gold standard (created by humans). Our contributions are threefold. First, we define a new semantic relationship called substitutability – how suitable a phrase is when used in place of another – and then design a generic system which measures/quantifies this relationship by exploiting the interlinking structure of external knowledge sources. Second, we develop two concrete substitutability systems based on our generic design: WordSub, which is backed by WordNet; and WikiSub, which is backed by Wikipedia. Third, we construct a dataset, with the help of human volunteers, that isolates the task of measuring substitutability. This dataset is then used to evaluate the performance of our substitutability systems, along with existing approximate string matching techniques, by comparing them using a set of agreement metrics. Our results clearly demonstrate that WordSub and WikiSub comfortably outperform current approaches to approximate string matching, including both lexical-based methods, such as R-precision; and semantically-oriented techniques, such as METEOR. In fact, WikiSub’s performance comes sensibly close to that of an average human volunteer, when comparing it to the optimistic (best-case) interhuman agreement.</p>


Author(s):  
A. I. Belousov

The main objective of this paper is to prove a theorem according to which a method of successive elimination of unknowns in the solution of systems of linear equations in the semi-rings with iteration gives the really smallest solution of the system. The proof is based on the graph interpretation of the system and establishes a relationship between the method of sequential elimination of unknowns and the method for calculating a cost matrix of a labeled oriented graph using the method of sequential calculation of cost matrices following the paths of increasing ranks. Along with that, and in terms of preparing for the proof of the main theorem, we consider the following important properties of the closed semi-rings and semi-rings with iteration.We prove the properties of an infinite sum (a supremum of the sequence in natural ordering of an idempotent semi-ring). In particular, the proof of the continuity of the addition operation is much simpler than in the known issues, which is the basis for the well-known algorithm for solving a linear equation in a semi-ring with iteration.Next, we prove a theorem on the closeness of semi-rings with iteration with respect to solutions of the systems of linear equations. We also give a detailed proof of the theorem of the cost matrix of an oriented graph labeled above a semi-ring as an iteration of the matrix of arc labels.The concept of an automaton over a semi-ring is introduced, which, unlike the usual labeled oriented graph, has a distinguished "final" vertex with a zero out-degree.All of the foregoing provides a basis for the proof of the main theorem, in which the concept of an automaton over a semi-ring plays the main role.The article's results are scientifically and methodologically valuable. The proposed proof of the main theorem allows us to relate two alternative methods for calculating the cost matrix of a labeled oriented graph, and the proposed proofs of already known statements can be useful in presenting the elements of the theory of semi-rings that plays an important role in mathematical studies of students majoring in software technologies and theoretical computer science.


Algorithmica ◽  
1994 ◽  
Vol 12 (4-5) ◽  
pp. 327-344 ◽  
Author(s):  
W. I. Chang ◽  
E. L. Lawler

2011 ◽  
Vol 50-51 ◽  
pp. 386-390
Author(s):  
Mao Yan Fang ◽  
Min Le Wang ◽  
Yi Ming Bi

The No Balance Assignment Problem (NBAP) is mainly resolved by changing it into Balance Assignment Problem (BAP) and using classical algorithm to deal with it now. This paper proposed Searching Best strategies Algorithm (SBSA) to resolve this problem, and it needn’t to change NBAP into BAP. SBSA resolves NBAP based on searching the best answer of the cost matrix. This algorithm’s theory is simple,and it is easy to operate. The result of the research indicate that the algorithm not only can deal with NBAP, but also can deal with BAP and other problems such as translation problem.


Sign in / Sign up

Export Citation Format

Share Document