inexact matching
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 6)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Anas Al-okaily ◽  
Abdelghani Tbakhi

Abstract Pattern matching is a fundamental process in almost every scientific domain. The problem involves finding the positions of a given pattern (usually of short length) in a reference stream of data (usually of large length). The matching can be as an exact or as an approximate (inexact) matching. Exact matching is to search for the pattern without allowing for mismatches (or insertions and deletions) of one or more characters in the pattern), while approximate matching is the opposite. For exact matching, several data structures that can be built in linear time and space are used and in practice nowadays. For approximate matching, the solutions proposed to solve this matching are non-linear and currently impractical. In this paper, we designed and implemented a structure that can be built in linear time and space and solve the approximate matching problem in (O(m + {log_Σ^k}n/{k!} + occ) search costs, where m is the length of the pattern, n is the length of the reference, and k is the number of tolerated mismatches (and insertion and deletions).


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2175
Author(s):  
Raphaël Abelé ◽  
Jean-Luc Damoiseaux ◽  
Redouane El Moubtahij ◽  
Jean-Marc Boi ◽  
Daniele Fronte ◽  
...  

In this paper, we present an infrared microscopy based approach for structures’ location in integrated circuits, to automate their secure characterization. The use of an infrared sensor is the key device for internal integrated circuit inspection. Two main issues are addressed. The first concerns the scan of integrated circuits using a motorized optical system composed of an infrared uncooled camera combined with an optical microscope. An automated system is required to focus the conductive tracks under the silicon layer. It is solved by an autofocus system analyzing the infrared images through a discrete polynomial image transform which allows an accurate features detection to build a focus metric robust against specific image degradation inherent to the acquisition context. The second issue concerns the location of structures to be characterized on the conductive tracks. Dealing with a large amount of redundancy and noise, a graph-matching method is presented—discriminating graph labels are developed to overcome the redundancy, while a flexible assignment optimizer solves the inexact matching arising from noises on graphs. The resulting automated location system brings reproducibility for secure characterization of integrated systems, besides accuracy and time speed increase.


Author(s):  
Shashika R Muramudalige ◽  
Benjamin W. K. Hung ◽  
Anura P Jayasumana ◽  
Indrakshi Ray ◽  
Jytte Klausen

2019 ◽  
Vol 26 (1) ◽  
pp. 21-33 ◽  
Author(s):  
Qiang Wei ◽  
Yaoyun Zhang ◽  
Muhammad Amith ◽  
Rebecca Lin ◽  
Jenay Lapeyrolerie ◽  
...  

Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning–based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.


Author(s):  
Marisol Flores-Garrido ◽  
J. Ariel Carrasco-Ochoa ◽  
José Fco. Martínez-Trinidad

Most algorithms to mine graph patterns, during the searching process, require a pattern to be identical to its occurrences, relying on the graph isomorphism problem. However, in recent years, there has been interest in the case in which it is acceptable to have some differences between a pattern and its occurrences, whether these differences are in labels or in structure. Allowing some differences and using inexact matching to measure the similarity between graphs lead to the discovery of new patterns, but some important challenges, such as the increment on the number of found patterns, make the post-mining analysis harder. In this work we focus on two extensions of the AGraP algorithm, which mines inexact patterns, addressing the issue of reducing the output pattern set while trying to retain the useful information gained through the use of inexact matching. First, exploring a traditional approach, we propose the CloseAFG algorithm that focuses on closed patterns. Then, we propose the IntAFG algorithm to find a subset of patterns covering the original pattern set, while lessening redundancy among selected patterns. We show the performance of our approaches through some experiments on synthetic databases; additionally, we also show the usefulness of the reduced pattern sets for image classification.


2015 ◽  
Author(s):  
◽  
Richard Beal ◽  

The parameterized string (p-string), a generalization of the traditional string, is composed of constant and parameter symbols. A parameterized match (p-match) exists between two p-strings if the constants match exactly and there exists a bijection between the parameter symbols. Historically, p-strings have been employed in source code cloning, plagiarism detection, and structural similarity between biological sequences. By handling the intricacies of the parameterized suffix, we can efficiently address complex applications with data structures also reusable in traditional matching scenarios. In this dissertation, we extend data structures for p-strings (and variants) to address sophisticated string computations.;We introduce a taxonomy of classes for longest factor problems. Using this taxonomy, we show an interesting connection between the parameterized longest previous factor (pLPF) and familiar data structures in string theory, including the border array, prefix array, longest common prefix array, and analogous p-string data structures. Exploiting this connection, we construct a multitude of data structures using the same general pLPF framework.;Before this dissertation, the p-match was defined predominately by the matching between uncompressed p-strings. Here, we introduce the compressed parameterized pattern match to find all p-matches between a pattern and a text, using only the pattern and a compressed form of the text. We present parameterized compression (p-compression) as a new way to losslessly compress data to support p-matching. Experimentally, it is shown that p-compression is competitive with standard compression schemes. Using p-compression, we address the compressed p-match independent of the underlying compression routine.;Currently, p-string theory lacks the capability to support indeterminate symbols, a staple essential for applications involving inexact matching such as in music analysis. In this work, we propose and efficiently address two new types of p-matching with indeterminate symbols. (1) We introduce the indeterminate parameterized match (ip-match) to permit matching with indeterminate holes in a p-string. We support the ip-match by introducing data structures that extend the prefix array. (2) From a different perspective, the equivalence parameterized match (e-match) evolves the p-match to consider intra-alphabet symbol classes as equivalence classes. We propose a method to perform the e-match using the p-string suffix array framework, i.e. the parameterized suffix array (pSA) and parameterized longest common prefix array (pLCP). Historically, direct constructions of the pSA and pLCP have suffered from quadratic time bounds in the worst-case. Here, we introduce new p-string theory to efficiently construct the pSA/pLCP and break the theoretical worst-case time barrier.;Biological applications have become a classical use of p-string theory. Here, we introduce the structural border array to provide a lightweight solution to the biologically-oriented variant of the p-match, i.e. the structural match (s-match) on structural strings (s-strings). Following the s-match, we show how to use s-string suffix structures to support various pattern matching problems involving RNA secondary structures. Finally, we propose/construct the forward stem matrix (FSM), a data structure to access RNA stem structures, and we apply the FSM to the detection of hairpins and pseudoknots in an RNA sequence.;This dissertation advances the state-of-the-art in p-string theory by developing data structures for p-strings/s-strings and using p-string/s-string theory in new and old contexts to address various applications. Due to the flexibility of the p-string/s-string, the data structures and algorithms in this work are also applicable to the myriad of problems in the string community that involve traditional strings.


Sign in / Sign up

Export Citation Format

Share Document