Co-estimation of Phylogeny-aware Alignment and Phylogenetic Tree

Mapping Intimacies ◽

10.1101/077503 ◽

2016 ◽

Cited By ~ 1

Author(s):

Chunxiang Li ◽

Alan Medlar ◽

Ari Löytynoja

Keyword(s):

Phylogenetic Trees ◽

Comparative Sequence Analysis ◽

Alignment Algorithm ◽

Sequence Alignments ◽

Iterative Search ◽

Guide Tree ◽

Alignment Algorithms ◽

Alignment Process ◽

Comparable Accuracy ◽

The Impact

AbstractThe phylogeny-aware alignment algorithm implemented in both PRANK and PAGAN has been found to produce highly accurate alignments for comparative sequence analysis. However, the algorithm’s reliance on a guide tree during the alignment process can bias the resulting alignment rendering it unsuitable for phylogenetic inference. To overcome these issues, we have developed a new tool, Canopy, for parallelized iterative search of optimal alignment. Using Canopy, we studied the impact of the guide tree as well as the number and relative divergence of sequences on the accuracy of the alignment and inferred phylogeny. We find that PAGAN is the more robust of the two phylogeny-aware alignment methods to errors in the guide tree, but Canopy largely resolves the guide tree-related biases in PRANK. We demonstrate that, for all experimental settings tested, Canopy produces the most accurate sequence alignments and, further, that the inferred phylogenetic trees are of comparable accuracy to those obtained with the leading alternative method, SATé. Our analyses also show that, unlike traditional alignment algorithms, the phylogeny-aware algorithm effectively uses the information from denser sequence sampling and produces more accurate alignments when additional closely-related sequences are included. All methods are available for download at http://wasabiapp.org/software.

Plastid phylogenomics of the Gynoxoid group (Senecioneae, Asteraceae) highlights the importance of motif-based sequence alignment amid low genetic distances

10.1101/2021.04.23.441144 ◽

2021 ◽

Author(s):

Belen Escobari ◽

Thomas Borsch ◽

Taylor S. Quedensley ◽

Michael Gruenstaeudl

Keyword(s):

Dna Sequence ◽

Phylogenetic Trees ◽

Plastid Genome ◽

Intergenic Spacer ◽

Genetic Distances ◽

Sequence Alignments ◽

Multiple Sequence ◽

Plastid Genomes ◽

Tree Inference ◽

The Impact

ABSTRACTPREMISEThe genus Gynoxys and relatives form a species-rich lineage of Andean shrubs and trees with low genetic distances within the sunflower subtribe Tussilaginineae. Previous molecular phylogenetic investigations of the Tussilaginineae have included few, if any, representatives of this Gynoxoid group or reconstructed ambiguous patterns of relationships for it.METHODSWe sequenced complete plastid genomes of 21 species of the Gynoxoid group and related Tussilaginineae and conducted detailed comparisons of the phylogenetic relationships supported by the gene, intron, and intergenic spacer partitions of these genomes. We also evaluated the impact of manual, motif-based adjustments of automatic DNA sequence alignments on phylogenetic tree inference.RESULTSOur results indicate that the inclusion of all plastid genome partitions is needed to infer fully resolved phylogenetic trees of the Gynoxoid group. Whole plastome-based tree inference suggests that the genera Gynoxys and Nordenstamia are polyphyletic and form the core clade of the Gynoxoid group. This clade is sister to a clade of Aequatorium and Paragynoxys and also includes some but not all representatives of Paracalia.CONCLUSIONSThe concatenation and combined analysis of all plastid genome partitions and the construction of manually curated, motif-based DNA sequence alignments are found to be instrumental in the recovery of strongly supported relationships of the Gynoxoid group. We demonstrate that the correct assessment of homology in genome-level plastid sequence datasets is crucial for subsequent phylogeny reconstruction and that the manual post-processing of multiple sequence alignments improves the reliability of such reconstructions amid low genetic distances between taxa.

A novel sequence alignment algorithm based on deep learning of the protein folding code

Bioinformatics ◽

10.1093/bioinformatics/btaa810 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mu Gao ◽

Jeffrey Skolnick

Keyword(s):

Protein Folding ◽

Deep Learning ◽

Sequence Alignment ◽

Protein Sequence ◽

Protein Structures ◽

Supplementary Information ◽

Alignment Algorithm ◽

Sequence Alignments ◽

Alignment Algorithms ◽

Structural Alignments

Abstract Motivation From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the ‘twilight zone’ of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent ‘d’). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. Results To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration. Availability and implementation Datasets and source codes of SAdLSA are available free of charge for academic users at http://sites.gatech.edu/cssb/sadlsa/. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Are all global alignment algorithms and implementations correct?

10.1101/031500 ◽

2015 ◽

Cited By ~ 3

Author(s):

Tomáš Flouri ◽

Kassian Kobert ◽

Torbjørn Rognes ◽

Alexandros Stamatakis

Keyword(s):

Dynamic Programming ◽

Phylogenetic Analyses ◽

Dynamic Programming Algorithm ◽

Programming Algorithm ◽

Global Alignment ◽

Sequence Alignments ◽

Optimal Sequence ◽

Divergence Time Estimates ◽

Alignment Algorithms ◽

The Impact

While implementing the algorithm, we discovered two mathematical mistakes in Gotoh's paper that induce sub-optimal sequence alignments. First, there are minor indexing mistakes in the dynamic programming algorithm which become apparent immediately when implementing the procedure. Hence, we report on these for the sake of completeness. Second, there is a more profound problem with the dynamic programming matrix initialization. This initialization issue can easily be missed and find its way into actual implementations. This error is also present in standard text books. Namely, the widely used books by Gusfield and Waterman. To obtain an initial estimate of the extent to which this error has been propagated, we scrutinized freely available undergraduate lecture slides. We found that 8 out of 31 lecture slides contained the mistake, while 16 out of 31 simply omit parts of the initialization, thus giving an incomplete description of the algorithm. Finally, by inspecting ten source codes and running respective tests, we found that five implementations were incorrect. Note that, not all bugs we identified are due to the mistake in Gotoh's paper. Three implementations rely on additional constraints that limit generality. Thus, only two out of ten yield correct results. We show that the error introduced by Gotoh is straightforward to resolve and provide a correct open-source reference implementation. We do believe though, that raising the awareness about these errors is critical, since the impact of incorrect pairwise sequence alignments that typically represent one of the very first stages in any bioinformatics data analysis pipeline can have a detrimental impact on downstream analyses such as multiple sequence alignment, orthology assignment, phylogenetic analyses, divergence time estimates, etc.

Molecular dating for phylogenies containing a mix of populations and species

10.1101/536656 ◽

2019 ◽

Cited By ~ 1

Author(s):

Beatriz Mello ◽

Qiqing Tao ◽

Sudhir Kumar

Keyword(s):

Bayesian Approach ◽

Phylogenetic Trees ◽

Molecular Dating ◽

Rate Variation ◽

Species Variation ◽

Sequence Alignments ◽

Clock Model ◽

Divergence Time Estimates ◽

The Impact ◽

The Bayesian Approach

AbstractConcurrent molecular dating of population and species divergences is essential in many biological investigations, including phylogeography, phylodynamics, and species delimitation studies. Multiple sequence alignments used in these investigations frequently consist of both intra- and inter-species samples (mixed samples). As a result, the phylogenetic trees contain inter-species, inter-population, and within population divergences. To date these sequence divergences, Bayesian relaxed clock methods are often employed, but they assume the same tree prior for both inter- and intra-species branching processes and require specification of a clock model for branch rates (independent vs. autocorrelated rates models). We evaluated the impact of using the same tree prior on the Bayesian divergence time estimates by analyzing computer-simulated datasets. We also examined the effect of the assumption of independence of evolutionary rate variation among branches when the branch rates are autocorrelated. Bayesian approach with Skyline-coalescent tree priors generally produced excellent molecular dates, with some tree priors (e.g., Yule) performing the best when evolutionary rates were autocorrelated, and lineage sorting was incomplete. We compared the performance of the Bayesian approach with a non-Bayesian, the RelTime method, which does not require specification of a tree prior or selection of a clock model. We found that RelTime performed as well as the Bayesian approach, and when the clock model was mis-specified, RelTime performed slightly better. These results suggest that the computationally efficient RelTime approach is also suitable to analyze datasets containing both populations and species variation.

Fast gap-affine pairwise alignment using the wavefront algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa777 ◽

2020 ◽

Author(s):

Santiago Marco-Sola ◽

Juan Carlos Moure ◽

Miquel Moreto ◽

Antonio Espinosa

Keyword(s):

Pairwise Alignment ◽

Read Length ◽

Alignment Algorithm ◽

Data Dependencies ◽

Sequencing Technologies ◽

Alignment Algorithms ◽

Oxford Nanopore ◽

Modern Molecular Biology ◽

Alignment Process ◽

Programming Algorithms

Abstract Motivation Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances in sequencing technologies press for the development of faster pairwise alignment algorithms that can scale with increasing read lengths and production yields. Results In this paper, we present the wavefront alignment algorithm (WFA), an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. As opposed to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O(ns), proportional to the read length n and the alignment score s, using O(s2) memory. Furthermore, our algorithm exhibits simple data dependencies that can be easily vectorized, even by the automatic features of modern compilers, for different architectures, without the need to adapt the code. We evaluate the performance of our algorithm, together with other state-of-the-art implementations. As a result, we demonstrate that the WFA runs 20-300x faster than other methods aligning short Illumina-like sequences, and 10-100x faster using long noisy reads like those produced by Oxford Nanopore Technologies. Availability The WFA algorithm is implemented within the wavefront-aligner library, and it is publicly available at https://github.com/smarco/WFA

A new robust quaternion-based initial alignment algorithm for stationary strapdown inertial navigation systems

Proceedings of the Institution of Mechanical Engineers Part G Journal of Aerospace Engineering ◽

10.1177/0954410020920473 ◽

2020 ◽

Vol 234 (12) ◽

pp. 1913-1925

Author(s):

Habib Ghanbarpourasl

Keyword(s):

Kalman Filter ◽

Inertial Navigation ◽

Bias Error ◽

Alignment Algorithm ◽

Navigation Systems ◽

Initial Alignment ◽

Unknown Parameters ◽

Alignment Algorithms ◽

Alignment Process ◽

Strapdown Inertial Navigation

A new robust quaternion Kalman filter is developed for accurate alignment of stationary strapdown inertial navigation system. Most fine alignment algorithms have tried to estimate the biases of gyroscopes and accelerometers to reduce the errors of the alignment process. In stationary platforms, due to fixed inputs for sensors, the summation of various errors such as fixed bias, misalignment, scale factor, and nonlinear errors acts like one bias error, and then the identification of each error will be impossible. The observability of gyros and accelerometers’ biases has also been studied. But, nowadays, we know that all of these unknown parameters are not observable. Then this problem can increase the complication of the alignment algorithm. The accelerometers’ errors mainly affect the errors of the roll and pitch angles, but a big portion of the heading’s error results from the gyroscopes’ errors. Modeling of all errors as additional states without considering the observability parameters has no benefits, but will increase the filter’s dimension, so the filter’s performance will decrease. In this study, due to the observability problem, a new robust multiplicative quaternion Kalman filter is designed for the alignment of a stationary platform. The presented algorithm does not estimate the sensors’ errors, but it is robust to uncertainty in the sensors’ errors. In the proposed scheme, the bounds of parameters’ errors are introduced to filter, and the filter tries to remain robust with respect to these uncertainties. The method uses the benefits of quaternions in attitude modeling, and then the robust filter is adapted to work with quaternions. The ability of the new algorithm is evaluated with MATLAB simulations. The outcomes show that the presented algorithm is more accurate than other traditional methods. The extended Kalman filter with accelerometers’ outputs and the horizontal velocities as the measurement equations and additive quaternion Kalman filter are used for comparisons.

Qudaich: A smart sequence aligner

10.1101/060509 ◽

2016 ◽

Author(s):

Sajia Akhter ◽

Robert A Edwards

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Query Sequence ◽

Metagenomic Data ◽

Alignment Algorithm ◽

Next Generation ◽

Sequence Alignments ◽

Alignment Algorithms ◽

Local Sequence

AbstractNext generation sequencing (NGS) technology produces massive amounts of data in a reasonable time and low cost. Analyzing and annotating these data requires sequence alignments to compare them with genes, proteins and genomes in different databases. Sequence alignment is the first step in metagenomics analysis, and pairwise comparisons of sequence reads provide a measure of similarity between environments. Most of the current aligners focus on aligning NGS datasets against long reference sequences rather than comparing between datasets. As the number of metagenomes and other genomic data increases each year, there is a demand for more sophisticated, faster sequence alignment algorithms. Here, we introduce a novel sequence aligner, Qudaich, which can efficiently process large volumes of data and is suited to de novo comparisons of next generation reads datasets. Qudaich can handle both DNA and protein sequences and attempts to provide the best possible alignment for each query sequence. Qudaich can produce more useful alignments quicker than other contemporary alignment algorithms.Author SummaryThe recent developments in sequencing technology provides high throughput sequencing data and have resulted in large volumes of genomic and metagenomic data available in public databases. Sequence alignment is an important step for annotating these data. Many sequence aligners have been developed in last few years for efficient analysis of these data, however most of them are only able to align DNA sequences and mainly focus on aligning NGS data against long reference genomes. Therefore, in this study we have designed a new sequence aligner, qudaich, which can generate pairwise local sequence alignment (at both the DNA and protein level) between two NGS datasets and can efficiently handle the large volume of NGS datasets. In qudaich, we introduce a unique sequence alignment algorithm, which outperforms the traditional approaches. Qudaich not only takes less time to execute, but also finds more useful alignments than contemporary aligners.

Hubsm: A Novel Amino Acid Substitution Matrix for Comparing Hub Proteins

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.53 ◽

2017 ◽

Vol 7 (8) ◽

pp. 212

Author(s):

Renganayaki G. ◽

Achuthsankar S. Nair

Keyword(s):

Amino Acid ◽

Amino Acid Substitution ◽

Low Complexity ◽

Database Search ◽

Substitution Matrix ◽

Compositional Bias ◽

Sequence Alignments ◽

Amino Acid Substitution Matrix ◽

Alignment Algorithms ◽

Hub Proteins

Sequence alignment algorithms and database search methods use BLOSUM and PAM substitution matrices constructed from general proteins. These de facto matrices are not optimal to align sequences accurately, for the proteins with markedly different compositional bias in the amino acid. In this work, a new amino acid substitution matrix is calculated for the disorder and low complexity rich region of Hub proteins, based on residue characteristics. Insights into the amino acid background frequencies and the substitution scores obtained from the Hubsm unveils the residue substitution patterns which differs from commonly used scoring matrices .When comparing the Hub protein sequences for detecting homologs, the use of this Hubsm matrix yields better results than PAM and BLOSUM matrices. Usage of Hubsm matrix can be optimal in database search and for the construction of more accurate sequence alignments of Hub proteins.

Faculty Opinions recommendation of Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1163036.623685 ◽

2009 ◽

Author(s):

Oliver Pybus

Keyword(s):

Phylogenetic Trees ◽

Large Scale ◽

Sequence Alignments

An Improved Initial Alignment Method of Strap-Down Inertial Navigation System on a Swaying Base

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.415.143 ◽

2013 ◽

Vol 415 ◽

pp. 143-148

Author(s):

Li Hua Zhu ◽

Xiang Hong Cheng

Keyword(s):

Navigation System ◽

Inertial Navigation System ◽

Alignment Algorithm ◽

Alignment Method ◽

Initial Alignment ◽

Vehicle Simulation ◽

Robust Property ◽

Simulation Results ◽

Lever Arm Effect ◽

The Impact

The design of an improved alignment method of SINS on a swaying base is presented in this paper. FIR filter is taken to decrease the impact caused by the lever arm effect. And the system also encompasses the online estimation of gyroscopes’ drift with Kalman filter in order to do the compensation, and the inertial freezing alignment algorithm which helps to resolve the attitude matrix with respect to its fast and robust property to provide the mathematical platform for the vehicle. Simulation results show that the proposed method is efficient for the initial alignment of the swaying base navigation system.