scholarly journals Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part I: Theoretical basis

2015 ◽  
Author(s):  
Kiyoshi Ezawa ◽  
Dan Graur ◽  
Giddy Landan

AbstractBackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Recently, such probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, none of these models can fully accommodate biologically realistic features, such as overlapping indels, power-law indel-length distributions, and indel rate variation across regions.ResultsHere, we theoretically tackle the ab initio calculation of the probability of a given sequence alignment under a genuine evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model allows general indel rate parameters including length distributions but does not impose any unrealistic restrictions on indels. Using techniques of the perturbation theory in physics, we expand the probability into a series over different numbers of indels. Our derivation of this perturbation expansion elegantly bridges the gap between Gillespie’s (1977) intuitive derivation of his own stochastic simulation method, which is now widely used in evolutionary simulators, and Feller’s (1940) mathematically rigorous theorems that underpin Gillespie′s method. We find a sufficient and nearly necessary set of conditions under which the probability can be expressed as the product of an overall factor and the contributions from regions separated by gapless columns of the alignment. The indel models satisfying these conditions include those with some kind of rate variation across regions, as well as space-homogeneous models. We also prove that, though with a caveat, pairwise probabilities calculated by the method of Miklós et al. (2004) are equivalent to those calculated by our ab initio formulation, at least under a space-homogenous model.ConclusionsOur ab initio perturbative formulation provides a firm theoretical ground that other indel models can rest on.[This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.]


2015 ◽  
Author(s):  
Kiyoshi Ezawa ◽  
Dan Graur ◽  
Giddy Landan

AbstractBackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. In a separate paper (Ezawa, Graur and Landan 2015a), we established a theoretical basis of our ab initio perturbative formulation of a genuine evolutionary model, more specifically, a continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. And we showed that, under some conditions, the ab initio probability of an alignment can be factorized into the product of an overall factor and contributions from regions (or local alignments) separated by gapless columns.ResultsThis paper describes how our ab initio perturbative formulation can be concretely used to approximately calculate the probabilities of all types of local pairwise alignments (PWAs) and some typical types of local multiple sequence alignments (MSAs). For each local alignment type, we calculated the fewest-indel contribution and the next-fewest-indel contribution to its probability, and we compared them under various conditions. We also derived a system of integral equations that can be numerically solved to give “exact solutions” for some common types of local PWAs. And we compared the obtained “exact solutions” with the fewest-indel contributions. The results indicated that even the fewest-indel terms alone can quite accurately approximate the probabilities of local alignments, as long as the segments and the branches in the tree are of modest lengths. Moreover, in the light of our formulation, we examined parameter regions where other indel models can safely approximate the correct evolutionary probabilities. The analyses also suggested some modifications necessary for these models to improve the accuracy of their probability estimations.ConclusionsAt least under modest conditions, our ab initio perturbative formulation can quite accurately calculate alignment probabilities under biologically realistic indel models. It also provides a sound reference point that other indel models can be compared to. [This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.]



2015 ◽  
Author(s):  
Kiyoshi Ezawa ◽  
Dan Graur ◽  
Giddy Landan

BackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. In a separate paper (Ezawa, Graur and Landan 2015a), we established the theoretical basis of ourab initioperturbative formulation of a continuous-time Markov model of the evolution of anentiresequence via insertions and deletions along time axis. In other separate papers (Ezawa, Graur and Landan 2015b,c), we also developed various analytical and computational methods to concretely calculate alignment probabilities via our formulation. In terms of frequencies, however, substitutions are usually more common than indels. Moreover, many experiments suggest that other mutations, such as genomic rearrangements and recombination, also play some important roles in sequence evolution.ResultsHere, we extend ourab initioperturbative formulation of agenuineevolutionary model so that it can incorporate other mutations. We give a sufficient set of conditions that the probability of evolution via both indels and substitutions is factorable into the product of an overall factor and local contributions. We also show that, under a set of conditions, the probability can be factorized into two sub-probabilities, one via indels alone and the other via substitutions alone. Moreover, we show that our formulation can be extended so that it can also incorporate genomic rearrangements, such as inversions and duplications. We also discuss how to accommodate some other types of mutations within our formulation.ConclusionsOurab initioperturbative formulation thus extended could in principle describe the stochastic evolution of anentiresequence along time axis via major types of mutations. [This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend theab initioperturbative formulation of a general continuous-time Markov model of indels.]



2015 ◽  
Author(s):  
Kiyoshi Ezawa ◽  
Dan Graur ◽  
Giddy Landan

AbstractBackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. In a separate paper (Ezawa, Graur and Landan 2015a), we established an ab initio perturbative formulation of a continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. And we showed that, under a certain set of conditions, the ab initio probability of an alignment can be factorized into the product of an overall factor and contributions from regions (or local alignments) separated by gapless columns. Moreover, in another separate paper (Ezawa, Graur and Landan 2015b), we performed concrete perturbation analyses on all types of local pairwise alignments (PWAs) and some typical types of local multiple sequence alignments (MSAs). The analyses indicated that even the fewest-indel terms alone can quite accurately approximate the probabilities of local alignments, as long as the segments and the branches in the tree are of modest lengths.ResultsTo examine whether or not the fewest-indel terms alone can well approximate the alignment probabilities of more general types of local MSAs as well, and as a first step toward the automatic application of our ab initio perturbative formulation, we developed an algorithm that calculates the first approximation of the probability of a given MSA under a given parameter setting including a phylogenetic tree. The algorithm first chops the MSA into gapped and gapless segments, second enumerates all parsimonious indel histories potentially responsible for each gapped segment, and finally calculates their contributions to the MSA probability. We performed validation analyses using more than ten million local MSAs. The results indicated that even the first approximation can quite accurately estimate the probability of each local MSA, as long as the gaps and tree branches are at most moderately long.ConclusionsThe newly developed algorithm, called LOLIPOG, brought our ab initio perturbation formulation at least one step closer to a practically useful method to quite accurately calculate the probability of a MSA under a given biologically realistic parameter setting.[This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.]List of abbreviationsHMMhidden Markov modelindelinsertion/deletionLHSlocal history setMSAmultiple sequence alignmentPASpreserved ancestral sitePWApairwise alignment



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Massimo Maiolo ◽  
Lorenzo Gatti ◽  
Diego Frei ◽  
Tiziano Leidi ◽  
Manuel Gil ◽  
...  

Abstract Background Current alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments (i.e., over-alignment) due to inconsistencies between the indel history and the phylogeny relating the input sequences. Results We present a new progressive multiple sequence alignment tool ProPIP. The process of insertions and deletions is described using an explicit evolutionary model—the Poisson Indel Process or PIP. The method is based on dynamic programming and is implemented in a frequentist framework. The source code can be compiled on Linux, macOS and Microsoft Windows platforms. The algorithm is implemented in C++ as standalone program. The source code is freely available on GitHub at https://github.com/acg-team/ProPIP and is distributed under the terms of the GNU GPL v3 license. Conclusions The use of an explicit indel evolution model allows to avoid over-alignment, to infer gaps in a phylogenetically consistent way and to make inferences about the rates of insertions and deletions. Instead of the arbitrary gap penalties, the parameters used by ProPIP are the insertion and deletion rates, which have biological interpretation and are contextualized in a probabilistic environment. As a result, indel rate settings may be optimised in order to infer phylogenetically meaningful gap patterns.



2012 ◽  
Vol 2 (6) ◽  
pp. 208-211
Author(s):  
Navjot Kaur ◽  
◽  
Rajbir Singh Cheema ◽  
Harmandeep Singh Harmandeep Singh






Sign in / Sign up

Export Citation Format

Share Document