A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text

A fundamental problem in computational biology is to deal with circular patterns. The problem consists of finding the least certain length substrings of a pattern and its rotations in the database. In this paper, a novel method is presented to deal with circular patterns. The problem is solved using two incremental steps. First, an algorithm is provided that reports all substrings of a given linear pattern in an online text. Next, without losing efficiency, the algorithm is extended to process all circular rotations of the pattern. For a given pattern P of size M, and a text T of size N, the algorithm reports all locations in the text where a substring of Pc is found, where Pc is one of the rotations of P. For an alphabet size σ, using O(M) space, desired goals are achieved in an average O(MN/σ) time, which is O(N) for all patterns of length M ≤ σ. Traditional string processing algorithms make use of advanced data structures such as suffix trees and automaton. We show that basic data structures such as arrays can be used in the text processing algorithms without compromising the efficiency.

Download Full-text

Minimum Common String Partition Problem: Hardness and Approximations

The Electronic Journal of Combinatorics ◽

10.37236/1947 ◽

2005 ◽

Vol 12 (1) ◽

Cited By ~ 12

Author(s):

Avraham Goldstein ◽

Petr Kolman ◽

Jie Zheng

Keyword(s):

Genome Rearrangement ◽

Linear Time ◽

Fundamental Problem ◽

Text Processing ◽

Partition Problem ◽

Sorting By Reversals ◽

String Comparison ◽

Minimum Number ◽

Tight Connection ◽

Minimum Common String Partition

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing and compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement. A partition of a string $A$ is a sequence ${\cal P} = (P_1,P_2,\dots,P_m)$ of strings, called the blocks, whose concatenation is equal to $A$. Given a partition ${\cal P}$ of a string $A$ and a partition ${\cal Q}$ of a string $B$, we say that the pair $\langle{{\cal P},{\cal Q}}\rangle$ is a common partition of $A$ and $B$ if ${\cal Q}$ is a permutation of ${\cal P}$. The minimum common string partition problem (MCSP) is to find a common partition of two strings $A$ and $B$ with the minimum number of blocks. The restricted version of MCSP where each letter occurs at most $k$ times in each input string, is denoted by $k$-MCSP. In this paper, we show that $2$-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard. We describe a $1.1037$-approximation for $2$-MCSP and a linear time $4$-approximation algorithm for $3$-MCSP. We are not aware of any better approximations.

Download Full-text

Efficient Web Mining for Traversal Path Patterns

Web Mining ◽

10.4018/978-1-59140-414-9.ch015 ◽

2011 ◽

pp. 322-338 ◽

Cited By ~ 1

Author(s):

Zhixiang Chen ◽

Richard H. Fowler ◽

Ada Wai-Chee Fu ◽

Chunyue Wang

Keyword(s):

Web Mining ◽

Linear Time ◽

Fundamental Problem ◽

A Priori ◽

Web Pages ◽

Suffix Trees ◽

Web Logs ◽

Large Alphabet ◽

Optimal Linear ◽

Linear Time Algorithms

A maximal forward reference of a Web user is a longest consecutive sequence of Web pages visited by the user in a session without revisiting some previously visited page in the sequence. Efficient mining of frequent traversal path patterns, that is, large reference sequences of maximal forward references, from very large Web logs is a fundamental problem in Web mining. This chapter aims at designing algorithms for this problem with the best possible efficiency. First, two optimal linear time algorithms are designed for finding maximal forward references from Web logs. Second, two algorithms for mining frequent traversal path patterns are devised with the help of a fast construction of shallow generalized suffix trees over a very large alphabet. These two algorithms have respectively provable linear and sublinear time complexity, and their performances are analyzed in comparison with the a priori-like algorithms and the Ukkonen algorithm. It is shown that these two new algorithms are substantially more efficient than the a priori-like algorithms and the Ukkonen algorithm.

Download Full-text

TACTICAL ROUTE PLANNING: NEW ALGORITHMS FOR DECOMPOSING THE MAP

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213096000146 ◽

1996 ◽

Vol 05 (01n02) ◽

pp. 199-218 ◽

Cited By ~ 4

Author(s):

J.R. BENTON ◽

S.S. IYENGAR ◽

W. DENG ◽

N. BRENER ◽

V.S. SUBRAHMANIAN

Keyword(s):

Data Structures ◽

Fundamental Problem ◽

Route Planning ◽

Decomposition Methods ◽

Search Algorithms ◽

Experimental Results ◽

New Approach ◽

Robotic Vehicles ◽

New Algorithms ◽

Large Grid

This paper defines a new approach and investigates a fundamental problem in route planners. This capability is important for robotic vehicles(Martian Rovers, etc.) and for planning off-road military maneuvers. The emphasis throughout this paper will be on the design and analysis and hieiaichical implementation of our route planner. This work was motivated by anticipation of the need to search a grid of a trillion points for optimum routes. This cannot be done simply by scaling upward from the algorithms used to search a grid of 10,000 points. Algorithms sufficient for the small grid are totally inadequate for the large grid. Soon, the challenge will be to compute off-road routes more than 100 km long and with a one or two-meter grid. Previous efforts are reviewed and the data structures, decomposition methods and search algorithms are analyzed and limitations are discussed. A detailed discussion of a hieraichical implementation is provided and the experimental results are analyzed.

Download Full-text

The longest common substring problem

Mathematical Structures in Computer Science ◽

10.1017/s0960129515000110 ◽

2015 ◽

Vol 27 (2) ◽

pp. 277-295 ◽

Cited By ~ 1

Author(s):

MAXIME CROCHEMORE ◽

COSTAS S. ILIOPOULOS ◽

ALESSIO LANGIU ◽

FILIPPO MIGNOSI

Keyword(s):

Data Structures ◽

Dna Sequences ◽

Simple Algorithm ◽

Suffix Trees ◽

Simple Method ◽

Suffix Arrays ◽

Lowest Common Ancestor ◽

Wide Range ◽

Efficient Data ◽

Longest Common Substring

Given a set $\mathcal{D}$ of q documents, the Longest Common Substring (LCS) problem asks, for any integer 2 ⩽ k ⩽ q, the longest substring that appears in k documents. LCS is a well-studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences alignments and analysis. This problem has been solved by Hui (2000International Journal of Computer Science and Engineering15 73–76) by using a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with the use of suffix trees.In this article, we present a simple method for solving the LCS problem by using suffix trees (STs) and classical union-find data structures. In turn, we show how this simple algorithm can be adapted in order to work with other space efficient data structures such as the enhanced suffix arrays (ESA) and the compressed suffix tree.

Download Full-text

Detection of Nunation Vowelization Types in The Quran Diacritical Marks Using Automated Text- Processing Algorithms: اكتشاف تنوين التركيب وتنوين التتابع في الضبط القرآني باستخدام خوارزميات المعالجة النصية الآلية

Journal of engineering sciences and information technology - مجلة العلوم الهندسية و تكنولوجيا المعلومات ◽

10.26389/ajsrp.r170620 ◽

2020 ◽

Vol 4 (3) ◽

Author(s):

Amir Adel Mabrouk Eldeib, Moulay Ibrahim El- Khalil Ghembaza

Keyword(s):

Text Processing ◽

Arabic Language ◽

Software Applications ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Holy Quran ◽

The Holy Quran ◽

Processing Algorithms ◽

Training Examples ◽

Automated Text Processing

The science of diacritical marks is closely related to the Holy Quran, as it was used in the Quran to remove confusion and error from the pronunciation of the reader, so the introduction of any technique in the process of processing Quranic texts will have an effect on facilitating the tasks of researchers in the field of Quranic studies, whether on the reader of the Quran, to help him read accurate and correct recitation, or on the tutor to help him compile a number of examples appropriate for training. The importance of this research lies in employing automated text- processing algorithms to determine the locations of the Nunation vowelization types in the Holy Quran, and the possibility of their computerizing in order to facilitate the accurate recitation of the Holy Quran and, at the same time, to collect training examples in a database or building a corpus for future use in many research and software applications for the Holy Quran and its sciences. This research aims to present a new idea through the proposition of a framework architecture that identifies and discover automatically the locations and types of the Nunation in the Holy Quran based on the part- of- speech tagging algorithm for Arabic language so as to determine the type of words, and then by using a knowledge base to discover the appropriate Nunation words and their locations, and finally discovering the type of Nunation so as to determine the vowelization of the last letter of each Nunation word according to the Quran diacritical marks science. Furthermore, another benefit is to link searching processes with Quranic texts towards extracting the composition Nunation and the sequence Nunations in the Holy Quran emerges from the science of Quran diacritical marks; and display them as data according to a set of options selected by the user through suitable applications interfaces. The basic elements that the results of searching Quranic texts should display are highlighted, in order to extract the positions and types of Nunation vowelizations. As well as, a template for the results of searching all types of Nunation in a specific Quranic Chapter is given, with several possible options to retrieve all data in detail.

Download Full-text

bioSyntax: Syntax Highlighting For Computational Biology

10.1101/235820 ◽

2017 ◽

Author(s):

Artem Babaian ◽

Anicet Ebou ◽

Alyssa Fegen ◽

Ho Yin (Jeffrey) Kam ◽

German E. Novakovsky ◽

...

Keyword(s):

Computational Biology ◽

Data Structures ◽

Biological Data ◽

Plain Text ◽

Critical Information ◽

Link Type ◽

Data Files

AbstractComputational biology requires the reading and comprehension of biological data files. Plain-text formats such as SAM, VCF, GTF, PDB and FASTA, often contain critical information that is obfuscated by the complexity of the data structures. bioSyntax (http://bioSyntax.org) is a freely available suite of syntax highlighting packages for vim, gedit, Sublime, and less, which aids computational scientists to parse and work with their data more efficiently.

Download Full-text

From Suffix Trees to Suffix Vectors

International Journal of Foundations of Computer Science ◽

10.1142/s0129054106004479 ◽

2006 ◽

Vol 17 (06) ◽

pp. 1385-1402 ◽

Cited By ~ 1

Author(s):

Élise Prieur ◽

Thierry Lecroq

Keyword(s):

Data Structures ◽

Suffix Tree ◽

Suffix Trees ◽

Linear Algorithms ◽

Economical Alternative

We present a first formal setting for suffix vectors that are space economical alternative data structures to suffix trees. We give two linear algorithms for converting a suffix tree into a suffix vector and conversely. We enrich suffix vectors with formulas for counting the number of occurrences of repeated substrings. We also propose an alternative implementation for suffix vectors that should outperform the existing one.

Download Full-text

OPTIMAL PARALLEL CONSTRUCTION OF MINIMAL SUFFIX AND FACTOR AUTOMATA

Parallel Processing Letters ◽

10.1142/s0129626496000054 ◽

1996 ◽

Vol 06 (01) ◽

pp. 35-44 ◽

Cited By ~ 5

Author(s):

DANY BRESLAUER ◽

RAMESH HARIHARAN

Keyword(s):

Parallel Algorithms ◽

Data Structures ◽

Suffix Tree ◽

Finite Automata ◽

Suffix Trees ◽

Deterministic Finite Automata ◽

Tree Construction ◽

Parallel Construction ◽

Construction Algorithms

This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor automata and suffix trees.

Download Full-text

A novel method of Post processing algorithms for image and VP8 video codec's

2013 International Conference on Signal Processing , Image Processing & Pattern Recognition ◽

10.1109/icsipr.2013.6497984 ◽

2013 ◽

Author(s):

S. Basavaraju ◽

C. R. Geetha ◽

H. D. GiriPrakash

Keyword(s):

Post Processing ◽

Novel Method ◽

Processing Algorithms

Download Full-text

Exact Solution for Relativistic Trajectories Using Modal Transseries

Symmetry ◽

10.3390/sym12091505 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1505

Author(s):

Luis Acedo ◽

Abraham J. Arenas ◽

Nicolas De La Espriella

Keyword(s):

Exact Solution ◽

Angular Momentum ◽

High Precision ◽

Limit Cycles ◽

Fundamental Problem ◽

Orbital Plane ◽

Relativistic Effects ◽

Geodesic Equation ◽

Novel Method ◽

Analytical Expressions

In this article, we design a novel method for finding the exact solution of the geodesic equation in Schwarzschild spacetime, which represents the trajectories of the particles. This is a fundamental problem in astrophysics and astrodynamics if we want to incorporate relativistic effects in high precision calculations. Here, we show that exact analytical expressions can be given, in terms of modal transseries for the spiral orbits as they approach the limit cycles given by the two circular orbits that appear for each angular momentum value. The solution is expressed in terms of transseries generated by transmonomials of the form e−nθ, n=1, 2, …, where θ is the angle measured in the orbital plane. Examples are presented that verify the effect of the solutions.

Download Full-text