scholarly journals Improving the time and space complexity of the WFA algorithm and generalizing its scoring

2022 ◽  
Author(s):  
Jordan M Eizenga ◽  
Benedict Paten

Modern genomic sequencing data is trending toward longer sequences with higher accuracy. Many analyses using these data will center on alignments, but classical exact alignment algorithms are infeasible for long sequences. The recently proposed WFA algorithm demonstrated how to perform exact alignment for long, similar sequences in O(sN) time and O(s2) memory, where s is a score that is low for similar sequences (Marco-Sola et al., 2021). However, this algorithm still has infeasible memory requirements for longer sequences. Also, it uses an alternate scoring system that is unfamiliar to many bioinformaticians. We describe variants of WFA that improve its asymptotic memory use from O(s2) to O(s3/2) and its asymptotic run time from O(sN) to O(s2 + N). We expect the reduction in memory use to be particularly impactful, as it makes it practical to perform highly multithreaded megabase-scale exact alignments in common compute environments. In addition, we show how to fold WFA's alternate scoring into the broader literature on alignment scores.

2009 ◽  
Vol 3 (1) ◽  
pp. 90
Author(s):  
D. El Baz ◽  
M. Elkihel ◽  
L. Gely ◽  
G. Plateau

2021 ◽  
Vol 14 (11) ◽  
pp. 2599-2612
Author(s):  
Nikolaos Tziavelis ◽  
Wolfgang Gatterbauer ◽  
Mirek Riedewald

We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with n denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of k , the k top-ranked answers are returned in O ( n polylog n + k log k ) time. This is within a polylogarithmic factor of O ( n + k log k ), i.e., the best known complexity for equi-joins, and even of O ( n + k ), i.e., the time it takes to look at the input and return k answers in any order. Our guarantees extend to join queries with selections and many types of projections (namely those called "free-connex" queries and those that use bag semantics). Remarkably, they hold even when the number of join results is n ℓ for a join of ℓ relations. The key ingredient is a novel O ( n polylog n )-size factorized representation of the query output , which is constructed on-the-fly for a given query and database. In addition to providing the first nontrivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude.


Author(s):  
Anshita Garg

This is a research-based project and the basic point motivating this project is learning and implementing algorithms that reduce time and space complexity. In the first part of the project, we reduce the time taken to search a given record by using a B/B+ tree rather than indexing and traditional sequential access. It is concluded that disk-access times are much slower than main memory access times. Typical seek times and rotational delays are of the order of 5 to 6 milliseconds and typical data transfer rates are of the range of 5 to 10 million bytes per second and therefore, main memory access times are likely to be at least 4 or 5 orders of magnitude faster than disk access on any given system. Therefore, the objective is to minimize the number of disk accesses, and thus, this project is concerned with techniques for achieving that objective i.e. techniques for arranging the data on a disk so that any required piece of data, say some specific record, can be located in a few I/O’s as possible. In the second part of the project, Dynamic Programming problems were solved with Recursion, Recursion With Storage, Iteration with Storage, Iteration with Smaller Storage. The problems which have been solved in these 4 variations are Fibonacci, Count Maze Path, Count Board Path, and Longest Common Subsequence. All 4 variations are an improvement over one another and thus time and space complexity are reduced significantly as we go from Recursion to Iteration with Smaller Storage.


2015 ◽  
Vol Vol. 17 no.2 (Graph Theory) ◽  
Author(s):  
Martiniano Eguia ◽  
Francisco Soulignac

International audience In this article we deal with the problems of finding the disimplicial arcs of a digraph and recognizing some interesting graph classes defined by their existence. A <i>diclique</i> of a digraph is a pair $V$ &rarr; $W$ of sets of vertices such that $v$ &rarr; $w$ is an arc for every $v$ &isin; $V$ and $w$ &isin; $W$. An arc $v$ &rarr; $w$ is <i>disimplicial</i> when it belongs to a unique maximal diclique. We show that the problem of finding the disimplicial arcs is equivalent, in terms of time and space complexity, to that of locating the transitive vertices. As a result, an efficient algorithm to find the bisimplicial edges of bipartite graphs is obtained. Then, we develop simple algorithms to build disimplicial elimination schemes, which can be used to generate bisimplicial elimination schemes for bipartite graphs. Finally, we study two classes related to perfect disimplicial elimination digraphs, namely weakly diclique irreducible digraphs and diclique irreducible digraphs. The former class is associated to finite posets, while the latter corresponds to dedekind complete finite posets.


Algorithms ◽  
2018 ◽  
Vol 11 (7) ◽  
pp. 104 ◽  
Author(s):  
Igor Gribanov ◽  
Rocky Taylor ◽  
Robert Sarracino

Computation of the distance between point and triangle in 3D is a common task in numerical analysis. The input values of the algorithm are coordinates of three points of the triangle and one point from which the distance is determined. An existing algorithm is extended to compute the gradient and the Hessian of that distance with respect to coordinates of involved points. Derivation of exact expressions for gradient and Hessian is presented, and numerical accuracy is evaluated for various cases. The algorithm has O(1) time and space complexity. The included open-source code may be used in applications where derivatives of point-triangle distance are required.


2011 ◽  
Vol 48-49 ◽  
pp. 753-756
Author(s):  
Xin Quan Chen

Facing to the shortcoming of Affinity Propagation algorithm (AP), we present two expanded and improved AP algorithms. In the two algorithms, the AP algorithm based on Grid Cell (APGC) is an effective extension of AP algorithm on the level of grid cells, and the AP clustering algorithm based on Near neighbour Sampling (APNS) is trying to make some improving in time and space complexity. From some simulated comparison experiments of three algorithms, we know that APGC and APNS algorithms have evident improving than AP algorithm in time and space complexity. They can not only get a good clustering quality for massive data sets, but also filtrate noises and isolates well. So we can say they are two effective clustering algorithms with much applied prospect. At last, several research directions are presented.


Sign in / Sign up

Export Citation Format

Share Document