scholarly journals Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy

2020 ◽  
Vol 12 (9) ◽  
pp. 1549-1565
Author(s):  
Kelsey Aadland ◽  
Bryan Kolaczkowski

Abstract Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.

2020 ◽  
Author(s):  
Kelsey Aadland ◽  
Bryan Kolaczkowski

Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, while phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly-accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.


2008 ◽  
Vol 363 (1512) ◽  
pp. 4041-4047 ◽  
Author(s):  
Steffen Klaere ◽  
Tanja Gesell ◽  
Arndt von Haeseler

We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.


2021 ◽  
Author(s):  
Robert M. Hubley ◽  
Travis J. Wheeler ◽  
Arian F.A. Smit

The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Such alignments play an important role in understanding and representing TE family history. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family. As a result, consensus sequences derived from Refiner-based MSAs are more similar to the true consensus.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i884-i894
Author(s):  
Jose Barba-Montoya ◽  
Qiqing Tao ◽  
Sudhir Kumar

Abstract Motivation As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. Results We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. Availability and implementation All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.


2015 ◽  
Vol 28 (1) ◽  
pp. 46 ◽  
Author(s):  
David A. Morrison ◽  
Matthew J. Morgan ◽  
Scot A. Kelchner

Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.


Paleobiology ◽  
1994 ◽  
Vol 20 (3) ◽  
pp. 362-367 ◽  
Author(s):  
William I. Ausich ◽  
David L. Meyer

Potential hybrid fossil crinoids, Eretmocrinus magnificus x Eretmocrinus praegravis, are identified from the Lower Mississippian Fort Payne Formation of south-central Kentucky. These are the first fossil hybrid crinoids identified, and one of very few examples of hybrids recognized in the fossil record. Eretmocrinus magnificus x E. praegravis specimens have shapes and calyx plate sculpturing that are morphologically intermediate between well-defined, distinct parent species. Suspected hybrids occur at localities where parent species co-occur and where the parent species are the most abundant; the hybrids occur at what may have been the distributional margins of the parent species; and the mixture of characters on suspected hybrids seems to be morphogenetically partitioned. Parent species are derived from separate lineages within Eretmocrinus, and hybridization is the most probable explanation for these morphologically intermediate specimens. This example highlights the need to consider hybridization as a potential interpretation of intermediate morphologies among fossils and raises questions concerning the impact of hybridization for our interpretation of the fossil record and the role of hybridization in the evolutionary process.


2018 ◽  
Author(s):  
Michael Nute ◽  
Ehsan Saleh ◽  
Tandy Warnow

AbstractThe estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks. There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations. multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology


Author(s):  
Nayana

Often, coalitions are formed by the hierarchical integrated energy systems (HIESs) and their evolutionary process which is driven by the benefits of stakeholders and consolidate energy consumers and producers. Several literature have failed to analyze the operation of HIES under the impact of multiple coalitions. At the lower level, multiple users, in the middle level, the multiple distributed energy stations (DESs) and at the upper level, one natural gas and one electricity utility company structure is used for analyzing the HIES operation with a trading scheme. The Lagrange function is used for deriving the optimal operation strategy based analytical function for each probable coalition and each market participant comprising of users and the DESs. It is evident from the results that in a single coalition, the profits linked to other DESs will decrease while increasing the profit of one DES with technological enhancements, users show an aversion towards DESs with high generation coefficient while they are attracted to the ones that enable reduction of heat and electricity price. Maintaining their isolation is preferred by high heat and electricity consuming DESs at the same energy price. Other coalitions and their operations are not affected by the change in parameters of one coalition.


Author(s):  
Sharon Smaldino ◽  
Lara Luetkehans

With all higher education educational endeavors there is a transformative element that enhances the progression forward in terms of academic program development. Teacher education is no exception to this aspect of the evolutionary process. The authors' story of that transformation and the impact of creative endeavors in teacher education offer a sense of moving beyond the traditional to the transformative in teacher education. Carter (1993) offers that the story can offer a perspective on our work and inform teacher education on the directions we might take to bring about improvement in our efforts to prepare educators for the future. The authors' story begins with a strong foundation and commitment to understanding the critical elements of successful partnerships. This foundation has served them for 15 years, and two distinct eras of partnership work that delineate the transformation. The authors explore each era: “The Professional Development School (PDS) Story” followed by “10 Years Later.”


Sign in / Sign up

Export Citation Format

Share Document