Communication vs Synchronisation in Parallel String Comparison

This paper presents a real-time constraint-free handprinted character recognition system based on a structural approach. After the preprocessing operation, a chain code is extracted to represent the character. The classification is based on the use of a processor dedicated to string comparison. The average computation time to recognize a character is about 0.07 seconds. During the learning step, the user can define any set of characters or symbols to be recognized by the system. Thus there are no constraints on the handprinting. The experimental tests show a high degree of accuracy (96%) for writer-dependent applications. Comparisons with other system and methods are discussed. We also present a comparison between the processor used in this system and the Wagner and Fischer algorithm. Finally, we describe some applications of the system.

Download Full-text

New linear systolic arrays for the string comparison algorithm

Parallel Computing ◽

10.1016/0167-8191(93)90025-g ◽

1993 ◽

Vol 19 (10) ◽

pp. 1177-1193 ◽

Cited By ~ 3

Author(s):

Marjan Gušev ◽

David J Evans

Keyword(s):

Systolic Arrays ◽

String Comparison

Download Full-text

Privacy-Preserving Speaker Identification as String Comparison

Privacy-Preserving Machine Learning for Speech Processing - Springer Theses ◽

10.1007/978-1-4614-4639-2_9 ◽

2012 ◽

pp. 89-95

Author(s):

Manas A. Pathak

Keyword(s):

Speaker Identification ◽

Privacy Preserving ◽

String Comparison

Download Full-text

Free Text Customer Requests Analysis: Information Extraction Based on Fuzzy String Comparison

Product Lifecycle Management Enabling Smart X - IFIP Advances in Information and Communication Technology ◽

10.1007/978-3-030-62807-9_16 ◽

2020 ◽

pp. 193-202

Author(s):

Alexander Smirnov ◽

Nikolay Shilov ◽

Kathrin Evers ◽

Dirk Weidig

Keyword(s):

Information Extraction ◽

Free Text ◽

String Comparison

Download Full-text

Minimum Common String Partition Problem: Hardness and Approximations

The Electronic Journal of Combinatorics ◽

10.37236/1947 ◽

2005 ◽

Vol 12 (1) ◽

Cited By ~ 12

Author(s):

Avraham Goldstein ◽

Petr Kolman ◽

Jie Zheng

Keyword(s):

Genome Rearrangement ◽

Linear Time ◽

Fundamental Problem ◽

Text Processing ◽

Partition Problem ◽

Sorting By Reversals ◽

String Comparison ◽

Minimum Number ◽

Tight Connection ◽

Minimum Common String Partition

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing and compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement. A partition of a string $A$ is a sequence ${\cal P} = (P_1,P_2,\dots,P_m)$ of strings, called the blocks, whose concatenation is equal to $A$. Given a partition ${\cal P}$ of a string $A$ and a partition ${\cal Q}$ of a string $B$, we say that the pair $\langle{{\cal P},{\cal Q}}\rangle$ is a common partition of $A$ and $B$ if ${\cal Q}$ is a permutation of ${\cal P}$. The minimum common string partition problem (MCSP) is to find a common partition of two strings $A$ and $B$ with the minimum number of blocks. The restricted version of MCSP where each letter occurs at most $k$ times in each input string, is denoted by $k$-MCSP. In this paper, we show that $2$-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard. We describe a $1.1037$-approximation for $2$-MCSP and a linear time $4$-approximation algorithm for $3$-MCSP. We are not aware of any better approximations.

Download Full-text

Algorithms for String Comparison in DNA Sequences

Proceedings of International Joint Conference on Computational Intelligence - Algorithms for Intelligent Systems ◽

10.1007/978-981-13-7564-4_29 ◽

2019 ◽

pp. 327-343

Author(s):

Dhiman Goswami ◽

Nishat Sultana ◽

Warda Ruheen Bristi

Keyword(s):

Dna Sequences ◽

String Comparison

Download Full-text

Binding SNOMED CT Terms to Archetype Elements

Methods of Information in Medicine ◽

10.3414/me13-02-0022 ◽

2015 ◽

Vol 54 (01) ◽

pp. 45-49 ◽

Cited By ~ 4

Author(s):

J. Bermudez ◽

A. Illarramendi ◽

I. Berges

Keyword(s):

Electronic Health Records ◽

Health Systems ◽

The Other ◽

Snomed Ct ◽

Health Records ◽

Large Size ◽

Level Of Confidence ◽

Comparison Methods ◽

String Comparison ◽

Text Information

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.Background: The proliferation of archetypes as a means to represent information of Electronic Health Records has raised the need of binding terminological codes – such as SNOMED CT codes – to their elements, in order to identify them univocally. However, the large size of the terminologies makes it difficult to perform this task manually.Objectives: To establish a baseline of results for the aforementioned problem by using off-the-shelf string comparison-based techniques against which results from more complex techniques could be evaluated.Methods: Nine Typed Comparison Methods were evaluated for binding using a set of 487 archetype elements. Their recall was calculated and Friedman and Nemenyi tests were applied in order to assess whether any of the methods outperformed the others.Results: Using the qGrams method along with the ‘Text’ information piece of archetype elements outperforms the other methods if a level of confidence of 90% is considered. A recall of 25.26% is obtained if just one SNOMED CT term is retrieved for each archetype element. This recall rises to 50.51% and 75.56% if 10 and 100 elements are retrieved respectively, that being a reduction of more than 99.99% on the SNOMED CT code set.Conclusions: The baseline has been established following the above-mentioned results. Moreover, it has been observed that although string comparison-based methods do not outperform more sophisticated techniques, they still can be an alternative for providing a reduced set of candidate terms for each archetype element from which the ultimate term can be chosen later in the more-than-likely manual supervision task.

Download Full-text