scholarly journals Are all global alignment algorithms and implementations correct?

2015 ◽  
Author(s):  
Tomáš Flouri ◽  
Kassian Kobert ◽  
Torbjørn Rognes ◽  
Alexandros Stamatakis

While implementing the algorithm, we discovered two mathematical mistakes in Gotoh's paper that induce sub-optimal sequence alignments. First, there are minor indexing mistakes in the dynamic programming algorithm which become apparent immediately when implementing the procedure. Hence, we report on these for the sake of completeness. Second, there is a more profound problem with the dynamic programming matrix initialization. This initialization issue can easily be missed and find its way into actual implementations. This error is also present in standard text books. Namely, the widely used books by Gusfield and Waterman. To obtain an initial estimate of the extent to which this error has been propagated, we scrutinized freely available undergraduate lecture slides. We found that 8 out of 31 lecture slides contained the mistake, while 16 out of 31 simply omit parts of the initialization, thus giving an incomplete description of the algorithm. Finally, by inspecting ten source codes and running respective tests, we found that five implementations were incorrect. Note that, not all bugs we identified are due to the mistake in Gotoh's paper. Three implementations rely on additional constraints that limit generality. Thus, only two out of ten yield correct results. We show that the error introduced by Gotoh is straightforward to resolve and provide a correct open-source reference implementation. We do believe though, that raising the awareness about these errors is critical, since the impact of incorrect pairwise sequence alignments that typically represent one of the very first stages in any bioinformatics data analysis pipeline can have a detrimental impact on downstream analyses such as multiple sequence alignment, orthology assignment, phylogenetic analyses, divergence time estimates, etc.


2002 ◽  
Vol 1802 (1) ◽  
pp. 263-270 ◽  
Author(s):  
Xuesong Zhou ◽  
Hani S. Mahmassani

An optimization framework for online flow propagation adjustment in a freeway context was proposed. Instead of performing local adjustment for individual links separately, the proposed framework considers the interconnectivity of links in a traffic network. In particular, dynamic behavior in the mesoscopic simulation is approximated by the finite-difference method at a macroscopic level. The proposed model seeks to minimize the deviation between simulated density and anticipated density. By taking advantage of the serial structure of a freeway, an efficient dynamic programming algorithm has been developed and tested. The experiment results compared with analytic results as the base case showed the superior performance of dynamic programming methods over the classical proportion control method. The effect of varying update intervals was also examined. The simulation results suggest that a greedy method considering the impact of inconsistency propagation achieves the best trade-off in terms of computation effort and solution quality.



2015 ◽  
Vol 77 (20) ◽  
Author(s):  
F. N. Muhamad ◽  
R. B. Ahmad ◽  
S. Mohd. Asi ◽  
M. N. Murad

The fundamental procedure of analyzing sequence content is sequence comparison. Sequence comparison can be defined as the problem of finding which parts of the sequences are similar and which parts are different, namely comparing two sequences to identify similarities and differences between them. A typical approach to solve this problem is to find a good and reasonable alignment between the two sequences. The main research in this project is to align the DNA sequences by using the Needleman-Wunsch algorithm for global alignment and Smith-Waterman algorithm for local alignment based on the Dynamic Programming algorithm. The Dynamic Programming Algorithm is guaranteed to find optimal alignment by exploring all possible alignments and choosing the best through the scoring and traceback techniques. The algorithms proposed and evaluated are to reduce the gaps in aligning sequences as well as the length of the sequences aligned without compromising the quality or correctness of results. In order to verify the accuracy and consistency of measurements obtained in Needleman-Wunsch and Smith-Waterman algorithms the data is compared with Emboss (global) and Emboss (local) with 600 strands test data.



Author(s):  
Yin-Wen Chang ◽  
Michael Collins

Decoding of phrase-based translation models in the general case is known to be NP-complete, by a reduction from the traveling salesman problem (Knight, 1999). In practice, phrase-based systems often impose a hard distortion limit that limits the movement of phrases during translation. However, the impact on complexity after imposing such a constraint is not well studied. In this paper, we describe a dynamic programming algorithm for phrase-based decoding with a fixed distortion limit. The runtime of the algorithm is O( nd! lh d+1) where n is the sentence length, d is the distortion limit, l is a bound on the number of phrases starting at any position in the sentence, and h is related to the maximum number of target language translations for any source word. The algorithm makes use of a novel representation that gives a new perspective on decoding of phrase-based models.



2019 ◽  
Vol 28 (13) ◽  
pp. 1950227
Author(s):  
Talal Bonny ◽  
Ridhwan Al Debsi ◽  
Mohamed Basel Almourad

Although dynamic programming (DP) is an optimization approach used to solve a complex problem fast, the time required to solve it is still not efficient and grows polynomially with the size of the input. In this contribution, we improve the computation time of the dynamic programming based algorithms by proposing a novel technique, which is called “SDP: Segmented Dynamic programming”. SDP finds the best way of splitting the compared sequences into segments and then applies the dynamic programming algorithm to each segment individually. This will reduce the computation time dramatically. SDP may be applied to any dynamic programming based algorithm to improve its computation time. As case studies, we apply the SDP technique on two different dynamic programming based algorithms; “Needleman–Wunsch (NW)”, the widely used program for optimal sequence alignment, and the LCS algorithm, which finds the “Longest Common Subsequence” between two input strings. The results show that applying the SDP technique in conjunction with the DP based algorithms improves the computation time by up to 80% in comparison to the sole DP algorithms, but with small or ignorable degradation in comparing results. This degradation is controllable and it is based on the number of split segments as an input parameter. However, we compare our results with the well-known heuristic FASTA sequence alignment algorithm, “GGSEARCH”. We show that our results are much closer to the optimal results than the “GGSEARCH” algorithm. The results are valid independent from the sequences length and their level of similarity. To show the functionality of our technique on the hardware and to verify the results, we implement it on the Xilinx Zynq-7000 FPGA.



2019 ◽  
Author(s):  
Jessica R. Creveling ◽  
◽  
Carling C. Hay ◽  
Cedric J. Hagen


Author(s):  
Yufei Ma ◽  
Ping-an Zhong ◽  
Bin Xu ◽  
Feilin Zhu ◽  
Jieyu Li ◽  
...  


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i884-i894
Author(s):  
Jose Barba-Montoya ◽  
Qiqing Tao ◽  
Sudhir Kumar

Abstract Motivation As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. Results We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. Availability and implementation All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.



Sign in / Sign up

Export Citation Format

Share Document