Merging Multi-Version Texts: a Generic Solution to the Overlap Problem

Proceedings of Balisage: The Markup Conference 2009 ◽

10.4242/balisagevol3.schmidt01 ◽

2009 ◽

Cited By ~ 4

Author(s):

Desmond Schmidt

Keyword(s):

Time Complexity ◽

Linear Time ◽

Hard Problem ◽

Alignment Quality ◽

Digital Text ◽

Multiple Sequence ◽

Worst Case ◽

Alignment Problem ◽

Alignment Process ◽

Simple Format

Multi-Version Documents or MVDs, as described in Schmidt and Colomb (Schm09), provide a simple format for representing overlapping structures in digital text. They permit the reuse of existing technologies, such as XML, to encode the content of individual versions, while allowing overlapping hierarchies (separate, partial or conditional) and textual variation (insertions, deletions, alternatives and transpositions) to exist within the same document. Most desired operations on MVDs may be performed by simple algorithms in linear time. However, creating and editing MVDs is a much harder and more complex operation that resembles the multiple-sequence alignment problem in biology. The inclusion of the transposition operation into the alignment process makes this a hard problem, with no solutions known to be both optimal and practical. However, a suitable heuristic algorithm can be devised, based in part on the most recent biological alignment programs, whose time complexity is quadratic in the worst case, and is often much faster. The results are satisfactory both in terms of speed and alignment quality. This means that MVDs can be considered as a practical and editable format suitable for representing many cases of overlapping structure in digital text.

Download Full-text

d-PBWT: dynamic positional Burrows-Wheeler transform

10.1101/2020.01.14.906487 ◽

2020 ◽

Author(s):

Ahsan Sanaullah ◽

Degui Zhi ◽

Shaojie Zhang

Keyword(s):

Data Structure ◽

Time Complexity ◽

Linear Time ◽

Genotype Imputation ◽

Worst Case ◽

Average Case ◽

Insertion And Deletion ◽

Static Data ◽

Efficient Retrieval ◽

Burrows Wheeler Transform

AbstractDurbin’s PBWT, a scalable data structure for haplotype matching, has been successfully applied to identical by descent (IBD) segment identification and genotype imputation. Once the PBWT of a haplotype panel is constructed, it supports efficient retrieval of all shared long segments among all individuals (long matches) and efficient query between an external haplotype and the panel. However, the standard PBWT is an array-based static data structure and does not support dynamic updates of the panel. Here, we generalize the static PBWT to a dynamic data structure, d-PBWT, where the reverse prefix sorting at each position is represented by linked lists. We developed efficient algorithms for insertion and deletion of individual haplotypes. In addition, we verified that d-PBWT can support all algorithms of PBWT. In doing so, we systematically investigated variations of set maximal match and long match query algorithms: while they all have average case time complexity independent of database size, they have different worst case complexities, linear time complexity with the size of the genome, and dependency on additional data structures.

Download Full-text

Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings

Algorithms ◽

10.3390/a12060124 ◽

2019 ◽

Vol 12 (6) ◽

pp. 124

Author(s):

Sukhpal Ghuman ◽

Emanuele Giaquinta ◽

Jorma Tarhio

Keyword(s):

Time Complexity ◽

Linear Time ◽

Experimental Results ◽

The Other ◽

Worst Case ◽

Run Length ◽

Original Algorithm ◽

Factorization Algorithms

We present two modifications of Duval’s algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best for small alphabets and it is able to skip a significant number of characters of the string. Moreover, it can be engineered to have linear time complexity in the worst case. When there is a run-length encoded string R of length ρ , the other algorithm computes the Lyndon factorization of R in O ( ρ ) time and in constant space. It is shown by experimental results that the new variations are faster than Duval’s original algorithm in many scenarios.

Download Full-text

Improved Approximation for Fréchet Distance on c-Packed Curves Matching Conditional Lower Bounds

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195917600056 ◽

2017 ◽

Vol 27 (01n02) ◽

pp. 85-119 ◽

Cited By ~ 1

Author(s):

Karl Bringmann ◽

Marvin Künnemann

Keyword(s):

Lower Bounds ◽

Time Complexity ◽

Linear Time ◽

High Dimensions ◽

Fréchet Distance ◽

Worst Case ◽

One Dimensional ◽

Dimension Formula ◽

Frechet Distance ◽

Special Case

The Fréchet distance is a well studied and very popular measure of similarity of two curves. The best known algorithms have quadratic time complexity, which has recently been shown to be optimal assuming the Strong Exponential Time Hypothesis (SETH) [Bringmann, FOCS'14]. To overcome the worst-case quadratic time barrier, restricted classes of curves have been studied that attempt to capture realistic input curves. The most popular such class are [Formula: see text]-packed curves, for which the Fréchet distance has a [Formula: see text]-approximation in time [Formula: see text] [Driemel et al., DCG'12]. In dimension [Formula: see text] this cannot be improved to [Formula: see text] for any [Formula: see text] unless SETH fails [Bringmann, FOCS'14]. In this paper, exploiting properties that prevent stronger lower bounds, we present an improved algorithm with time complexity [Formula: see text]. This improves upon the algorithm by Driemel et al. for any [Formula: see text]. Moreover, our algorithm's dependence on [Formula: see text], [Formula: see text] and [Formula: see text] is optimal in high dimensions apart from lower order factors, unless SETH fails. Our main new ingredients are as follows: For filling the classical free-space diagram we project short subcurves onto a line, which yields one-dimensional separated curves with roughly the same pairwise distances between vertices. Then we tackle this special case in near-linear time by carefully extending a greedy algorithm for the Fréchet distance of one-dimensional separated curves.

Download Full-text

EFFICIENT CONSTRAINED MULTIPLE SEQUENCE ALIGNMENT WITH PERFORMANCE GUARANTEE

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005000977 ◽

2005 ◽

Vol 03 (01) ◽

pp. 1-18 ◽

Cited By ~ 20

Author(s):

FRANCIS Y. L. CHIN ◽

N. L. HO ◽

T. W. LAM ◽

PRUDENCE W. H. WONG

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Real Data ◽

Data Sets ◽

Multiple Sequence ◽

Worst Case ◽

Resource Requirements ◽

Alignment Problem ◽

Limited Application ◽

New Algorithms

The constrained multiple sequence alignment problem is to align a set of sequences of maximum length n subject to a given constrained sequence, which arises from some knowledge of the structure of the sequences. This paper presents new algorithms for this problem, which are more efficient in terms of time and space (memory) than the previous algorithms,15 and with a worst-case guarantee on the quality of the alignment. Saving the space requirement by a quadratic factor is particularly significant as the previous O(n4)-space algorithm has limited application due to its huge memory requirement. Experiments on real data sets confirm that our new algorithms show improvements in both alignment quality and resource requirements.

Download Full-text

Fast and accurate large multiple sequence alignments using root-to-leave regressive computation

10.1101/490235 ◽

2018 ◽

Cited By ~ 2

Author(s):

Edgar Garriga ◽

Paolo Di Tommaso ◽

Cedrik Magis ◽

Ionas Erb ◽

Hafid Laayouni ◽

...

Keyword(s):

Linear Time ◽

Scale Up ◽

Approximate Solutions ◽

Biological Sequences ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Multiple Alignments ◽

Genomic Analyses ◽

Alignment Problem

AbstractInferences derived from large multiple alignments of biological sequences are critical to many areas of biology, including evolution, genomics, biochemistry, and structural biology. However, the complexity of the alignment problem imposes the use of approximate solutions. The most common is the progressive algorithm, which starts by aligning the most similar sequences, incorporating the remaining ones following the order imposed by a guide-tree. We developed and validated on protein sequences a regressive algorithm that works the other way around, aligning first the most dissimilar sequences. Our algorithm produces more accurate alignments than non-regressive methods, especially on datasets larger than 10,000 sequences. By design, it can run any existing alignment method in linear time thus allowing the scale-up required for extremely large genomic analyses.One Sentence SummaryInitiating alignments with the most dissimilar sequences allows slow and accurate methods to be used on large datasets

Download Full-text

On the time complexity of worst-case system identification

IEEE Transactions on Automatic Control ◽

10.1109/9.284870 ◽

1994 ◽

Vol 39 (5) ◽

pp. 944-950 ◽

Cited By ~ 77

Author(s):

K. Poolla ◽

A. Tikku

Keyword(s):

System Identification ◽

Time Complexity ◽

Worst Case ◽

Case System

Download Full-text

Linear time complexity GF(256) RaptorQ implementation on GPU

2017 International Conference on Information and Communication Technology Convergence (ICTC) ◽

10.1109/ictc.2017.8190987 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sunwoong Joo

Keyword(s):

Time Complexity ◽

Linear Time

Download Full-text

Adaptive Multiresolution and Dedicated Elastic Matching in Linear Time Complexity for Time Series Data Mining

Sixth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2006.84 ◽

2006 ◽

Cited By ~ 3

Author(s):

Pierre-francois Marteau ◽

Gildas Menier

Keyword(s):

Data Mining ◽

Time Series ◽

Time Complexity ◽

Time Series Data ◽

Linear Time ◽

Series Data ◽

Time Series Data Mining

Download Full-text

A Lagrangian relaxation approach for the multiple sequence alignment problem

Journal of Combinatorial Optimization ◽

10.1007/s10878-008-9139-z ◽

2008 ◽

Vol 16 (2) ◽

pp. 127-154 ◽

Cited By ~ 5

Author(s):

Ernst Althaus ◽

Stefan Canzar

Keyword(s):

Lagrangian Relaxation ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Problem

Download Full-text

An Area Preserving Transformation Algorithm for Press Forming Blank Development

19th Design Automation Conference: Volume 1 — Mechanical System Dynamics; Concurrent and Robust Design; Design for Assembly and Manufacture; Genetic Algorithms in Design and Structural Optimization ◽

10.1115/detc1993-0304 ◽

1993 ◽

Author(s):

Nirmal K. Nair ◽

James H. Oliver

Keyword(s):

Time Complexity ◽

Linear Time ◽

Geometric Interpretation ◽

Surface Geometry ◽

Forming Process ◽

Surface Design ◽

Design Assessment ◽

Press Forming ◽

Blank Shape ◽

Area Preserving

Abstract An efficient algorithm is presented to determine the blank shape necessary to manufacture a surface by press forming. The technique is independent of material properties and instead uses surface geometry and an area conservation constraint to generate a geometrically feasible blank shape. The algorithm is formulated as an approximate geometric interpretation of the reversal of the forming process. The primary applications for this technique are in preliminary surface design, assessment of manufacturability, and location of binder wrap. Since the algorithm exhibits linear time complexity, it is amenable to implementation as an interactive design aid. The algorithm is applied to two example surfaces and the results are discussed.

Download Full-text