scholarly journals Co-estimation of Phylogeny-aware Alignment and Phylogenetic Tree

2016 ◽  
Author(s):  
Chunxiang Li ◽  
Alan Medlar ◽  
Ari Löytynoja

AbstractThe phylogeny-aware alignment algorithm implemented in both PRANK and PAGAN has been found to produce highly accurate alignments for comparative sequence analysis. However, the algorithm’s reliance on a guide tree during the alignment process can bias the resulting alignment rendering it unsuitable for phylogenetic inference. To overcome these issues, we have developed a new tool, Canopy, for parallelized iterative search of optimal alignment. Using Canopy, we studied the impact of the guide tree as well as the number and relative divergence of sequences on the accuracy of the alignment and inferred phylogeny. We find that PAGAN is the more robust of the two phylogeny-aware alignment methods to errors in the guide tree, but Canopy largely resolves the guide tree-related biases in PRANK. We demonstrate that, for all experimental settings tested, Canopy produces the most accurate sequence alignments and, further, that the inferred phylogenetic trees are of comparable accuracy to those obtained with the leading alternative method, SATé. Our analyses also show that, unlike traditional alignment algorithms, the phylogeny-aware algorithm effectively uses the information from denser sequence sampling and produces more accurate alignments when additional closely-related sequences are included. All methods are available for download at http://wasabiapp.org/software.


2021 ◽  
Author(s):  
Belen Escobari ◽  
Thomas Borsch ◽  
Taylor S. Quedensley ◽  
Michael Gruenstaeudl

ABSTRACTPREMISEThe genus Gynoxys and relatives form a species-rich lineage of Andean shrubs and trees with low genetic distances within the sunflower subtribe Tussilaginineae. Previous molecular phylogenetic investigations of the Tussilaginineae have included few, if any, representatives of this Gynoxoid group or reconstructed ambiguous patterns of relationships for it.METHODSWe sequenced complete plastid genomes of 21 species of the Gynoxoid group and related Tussilaginineae and conducted detailed comparisons of the phylogenetic relationships supported by the gene, intron, and intergenic spacer partitions of these genomes. We also evaluated the impact of manual, motif-based adjustments of automatic DNA sequence alignments on phylogenetic tree inference.RESULTSOur results indicate that the inclusion of all plastid genome partitions is needed to infer fully resolved phylogenetic trees of the Gynoxoid group. Whole plastome-based tree inference suggests that the genera Gynoxys and Nordenstamia are polyphyletic and form the core clade of the Gynoxoid group. This clade is sister to a clade of Aequatorium and Paragynoxys and also includes some but not all representatives of Paracalia.CONCLUSIONSThe concatenation and combined analysis of all plastid genome partitions and the construction of manually curated, motif-based DNA sequence alignments are found to be instrumental in the recovery of strongly supported relationships of the Gynoxoid group. We demonstrate that the correct assessment of homology in genome-level plastid sequence datasets is crucial for subsequent phylogeny reconstruction and that the manual post-processing of multiple sequence alignments improves the reliability of such reconstructions amid low genetic distances between taxa.



Author(s):  
Mu Gao ◽  
Jeffrey Skolnick

Abstract Motivation From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the ‘twilight zone’ of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent ‘d’). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. Results To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration. Availability and implementation Datasets and source codes of SAdLSA are available free of charge for academic users at http://sites.gatech.edu/cssb/sadlsa/. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.



2015 ◽  
Author(s):  
Tomáš Flouri ◽  
Kassian Kobert ◽  
Torbjørn Rognes ◽  
Alexandros Stamatakis

While implementing the algorithm, we discovered two mathematical mistakes in Gotoh's paper that induce sub-optimal sequence alignments. First, there are minor indexing mistakes in the dynamic programming algorithm which become apparent immediately when implementing the procedure. Hence, we report on these for the sake of completeness. Second, there is a more profound problem with the dynamic programming matrix initialization. This initialization issue can easily be missed and find its way into actual implementations. This error is also present in standard text books. Namely, the widely used books by Gusfield and Waterman. To obtain an initial estimate of the extent to which this error has been propagated, we scrutinized freely available undergraduate lecture slides. We found that 8 out of 31 lecture slides contained the mistake, while 16 out of 31 simply omit parts of the initialization, thus giving an incomplete description of the algorithm. Finally, by inspecting ten source codes and running respective tests, we found that five implementations were incorrect. Note that, not all bugs we identified are due to the mistake in Gotoh's paper. Three implementations rely on additional constraints that limit generality. Thus, only two out of ten yield correct results. We show that the error introduced by Gotoh is straightforward to resolve and provide a correct open-source reference implementation. We do believe though, that raising the awareness about these errors is critical, since the impact of incorrect pairwise sequence alignments that typically represent one of the very first stages in any bioinformatics data analysis pipeline can have a detrimental impact on downstream analyses such as multiple sequence alignment, orthology assignment, phylogenetic analyses, divergence time estimates, etc.



2019 ◽  
Author(s):  
Beatriz Mello ◽  
Qiqing Tao ◽  
Sudhir Kumar

AbstractConcurrent molecular dating of population and species divergences is essential in many biological investigations, including phylogeography, phylodynamics, and species delimitation studies. Multiple sequence alignments used in these investigations frequently consist of both intra- and inter-species samples (mixed samples). As a result, the phylogenetic trees contain inter-species, inter-population, and within population divergences. To date these sequence divergences, Bayesian relaxed clock methods are often employed, but they assume the same tree prior for both inter- and intra-species branching processes and require specification of a clock model for branch rates (independent vs. autocorrelated rates models). We evaluated the impact of using the same tree prior on the Bayesian divergence time estimates by analyzing computer-simulated datasets. We also examined the effect of the assumption of independence of evolutionary rate variation among branches when the branch rates are autocorrelated. Bayesian approach with Skyline-coalescent tree priors generally produced excellent molecular dates, with some tree priors (e.g., Yule) performing the best when evolutionary rates were autocorrelated, and lineage sorting was incomplete. We compared the performance of the Bayesian approach with a non-Bayesian, the RelTime method, which does not require specification of a tree prior or selection of a clock model. We found that RelTime performed as well as the Bayesian approach, and when the clock model was mis-specified, RelTime performed slightly better. These results suggest that the computationally efficient RelTime approach is also suitable to analyze datasets containing both populations and species variation.



Author(s):  
Santiago Marco-Sola ◽  
Juan Carlos Moure ◽  
Miquel Moreto ◽  
Antonio Espinosa

Abstract Motivation Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances in sequencing technologies press for the development of faster pairwise alignment algorithms that can scale with increasing read lengths and production yields. Results In this paper, we present the wavefront alignment algorithm (WFA), an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. As opposed to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O(ns), proportional to the read length n and the alignment score s, using O(s2) memory. Furthermore, our algorithm exhibits simple data dependencies that can be easily vectorized, even by the automatic features of modern compilers, for different architectures, without the need to adapt the code. We evaluate the performance of our algorithm, together with other state-of-the-art implementations. As a result, we demonstrate that the WFA runs 20-300x faster than other methods aligning short Illumina-like sequences, and 10-100x faster using long noisy reads like those produced by Oxford Nanopore Technologies. Availability The WFA algorithm is implemented within the wavefront-aligner library, and it is publicly available at https://github.com/smarco/WFA



Author(s):  
Habib Ghanbarpourasl

A new robust quaternion Kalman filter is developed for accurate alignment of stationary strapdown inertial navigation system. Most fine alignment algorithms have tried to estimate the biases of gyroscopes and accelerometers to reduce the errors of the alignment process. In stationary platforms, due to fixed inputs for sensors, the summation of various errors such as fixed bias, misalignment, scale factor, and nonlinear errors acts like one bias error, and then the identification of each error will be impossible. The observability of gyros and accelerometers’ biases has also been studied. But, nowadays, we know that all of these unknown parameters are not observable. Then this problem can increase the complication of the alignment algorithm. The accelerometers’ errors mainly affect the errors of the roll and pitch angles, but a big portion of the heading’s error results from the gyroscopes’ errors. Modeling of all errors as additional states without considering the observability parameters has no benefits, but will increase the filter’s dimension, so the filter’s performance will decrease. In this study, due to the observability problem, a new robust multiplicative quaternion Kalman filter is designed for the alignment of a stationary platform. The presented algorithm does not estimate the sensors’ errors, but it is robust to uncertainty in the sensors’ errors. In the proposed scheme, the bounds of parameters’ errors are introduced to filter, and the filter tries to remain robust with respect to these uncertainties. The method uses the benefits of quaternions in attitude modeling, and then the robust filter is adapted to work with quaternions. The ability of the new algorithm is evaluated with MATLAB simulations. The outcomes show that the presented algorithm is more accurate than other traditional methods. The extended Kalman filter with accelerometers’ outputs and the horizontal velocities as the measurement equations and additive quaternion Kalman filter are used for comparisons.



2016 ◽  
Author(s):  
Sajia Akhter ◽  
Robert A Edwards

AbstractNext generation sequencing (NGS) technology produces massive amounts of data in a reasonable time and low cost. Analyzing and annotating these data requires sequence alignments to compare them with genes, proteins and genomes in different databases. Sequence alignment is the first step in metagenomics analysis, and pairwise comparisons of sequence reads provide a measure of similarity between environments. Most of the current aligners focus on aligning NGS datasets against long reference sequences rather than comparing between datasets. As the number of metagenomes and other genomic data increases each year, there is a demand for more sophisticated, faster sequence alignment algorithms. Here, we introduce a novel sequence aligner, Qudaich, which can efficiently process large volumes of data and is suited to de novo comparisons of next generation reads datasets. Qudaich can handle both DNA and protein sequences and attempts to provide the best possible alignment for each query sequence. Qudaich can produce more useful alignments quicker than other contemporary alignment algorithms.Author SummaryThe recent developments in sequencing technology provides high throughput sequencing data and have resulted in large volumes of genomic and metagenomic data available in public databases. Sequence alignment is an important step for annotating these data. Many sequence aligners have been developed in last few years for efficient analysis of these data, however most of them are only able to align DNA sequences and mainly focus on aligning NGS data against long reference genomes. Therefore, in this study we have designed a new sequence aligner, qudaich, which can generate pairwise local sequence alignment (at both the DNA and protein level) between two NGS datasets and can efficiently handle the large volume of NGS datasets. In qudaich, we introduce a unique sequence alignment algorithm, which outperforms the traditional approaches. Qudaich not only takes less time to execute, but also finds more useful alignments than contemporary aligners.



Author(s):  
Renganayaki G. ◽  
Achuthsankar S. Nair

Sequence alignment algorithms and  database search methods use BLOSUM and PAM substitution matrices constructed from general proteins. These de facto matrices are not optimal to align sequences accurately, for the proteins with markedly different compositional bias in the amino acid.   In this work, a new amino acid substitution matrix is calculated for the disorder and low complexity rich region of Hub proteins, based on residue characteristics. Insights into the amino acid background frequencies and the substitution scores obtained from the Hubsm unveils the  residue substitution patterns which differs from commonly used scoring matrices .When comparing the Hub protein sequences for detecting homologs,  the use of this Hubsm matrix yields better results than PAM and BLOSUM matrices. Usage of Hubsm matrix can be optimal in database search and for the construction of more accurate sequence alignments of Hub proteins.



2013 ◽  
Vol 415 ◽  
pp. 143-148
Author(s):  
Li Hua Zhu ◽  
Xiang Hong Cheng

The design of an improved alignment method of SINS on a swaying base is presented in this paper. FIR filter is taken to decrease the impact caused by the lever arm effect. And the system also encompasses the online estimation of gyroscopes’ drift with Kalman filter in order to do the compensation, and the inertial freezing alignment algorithm which helps to resolve the attitude matrix with respect to its fast and robust property to provide the mathematical platform for the vehicle. Simulation results show that the proposed method is efficient for the initial alignment of the swaying base navigation system.



Sign in / Sign up

Export Citation Format

Share Document