string algorithms Latest Research Papers

The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k -mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k -mer set has emerged as a shared underlying component. A set of k -mers has unique features and applications that, over the past 10 years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k -mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.

Download Full-text

A Survey on Shortest Unique Substring Queries

Algorithms ◽

10.3390/a13090224 ◽

2020 ◽

Vol 13 (9) ◽

pp. 224

Author(s):

Paniz Abedin ◽

M. Oğuzhan Külekci ◽

Shama V. Thankachan

Keyword(s):

Information Retrieval ◽

String Algorithms ◽

Active Line ◽

Recent Developments

The shortest unique substring (SUS) problem is an active line of research in the field of string algorithms and has several applications in bioinformatics and information retrieval. The initial version of the problem was proposed by Pei et al. [ICDE’13]. Over the years, many variants and extensions have been pursued, which include positional-SUS, interval-SUS, approximate-SUS, palindromic-SUS, range-SUS, etc. In this article, we highlight some of the key results and summarize the recent developments in this area.

Download Full-text

Tube-Based Taut String Algorithms for Total Variation Regularization

Mathematics ◽

10.3390/math8071141 ◽

2020 ◽

Vol 8 (7) ◽

pp. 1141

Author(s):

Artyom Makovetskii ◽

Sergei Voronin ◽

Vitaly Kober ◽

Aleksei Voronin

Keyword(s):

Exact Solutions ◽

Total Variation ◽

Total Variation Regularization ◽

Geometric Description ◽

String Method ◽

String Algorithms ◽

Practical Applications ◽

Taut String ◽

Tv Regularization ◽

Regularization Problem

Removing noise from signals using total variation regularization is a challenging signal processing problem arising in many practical applications. The taut string method is one of the most efficient approaches for solving the 1D TV regularization problem. In this paper we propose a geometric description of the linearized taut string method. This geometric description leads to the notion of the “tube”. We propose three tube-based taut string algorithms for total variation regularization. Different weight functionals can be used in the 1D TV regularization that lead to different types of tubes. We consider uniform, vertically nonuniform, vertically and horizontally nonuniform tubes. The proposed geometric approach is used to speed-up TV regularization processing by dividing the tubes into subtubes and using parallel processing. We introduce the concept of a relatively convex tube and describe the relationship between the geometric characteristics of tubes and exact solutions to the TV regularization. The properties of exact solutions can also be used to design efficient algorithms for solving the TV regularization problem. The performance of the proposed algorithms is discussed and illustrated by computer simulation.

Download Full-text

Fuzzy String Matching Procedure

The Open Bioinformatics Journal ◽

10.2174/1875036202013010050 ◽

2020 ◽

Vol 13 (1) ◽

pp. 50-56

Author(s):

Zekâi Şen

Keyword(s):

Probability Distribution ◽

Fuzzy Number ◽

String Matching ◽

Distribution Functions ◽

Number Representation ◽

String Algorithms ◽

Collective Behaviors ◽

Text String ◽

Probability Distribution Functions ◽

Random Variability

Background: There are different methodologies for DNA comparison based on two string algorithms, which are dependent on crisp logical principles, where there is no room for verbal (linguistic) uncertainty. These are successfully applicable procedures in DNA bioinformatics researches even by taking into consideration probabilistic random variability components based on the probability distribution functions of various types. Objective: The main purpose of this paper is to review first briefly all available DNA string matching methodologies that are based on crisp logic and then to suggest a new method based on the fuzzy logic rules and application. Methods: There are different methodologies for DNA comparison based on two string algorithms, which are dependent on crisp logical principles, where there is no room for verbal (linguistic) uncertainty. These are successfully applicable procedures in DNA bioinformatics researchers even by taking into consideration probabilistic random variability components based on the probability distribution functions of various types. Results: Fuzzy number representation of each gene implies some sort of uncertainty or unhealthiness in some or all the genes. Their better identifications can be achieved on the basis of fuzzy numbers with different membership degrees, which imply the unhealthiness or healthiness of the genes and their collective behaviors. Conclusion: After the development of fuzzy number representation of the text string coupled with crisp pattern string their relationships are searched at different shift operations, and hence, the possibility of defaulters are identified in the text string with a certain degree of membership.

Download Full-text