scholarly journals Poisson Approximation for the Number of Repeats in a Stationary Markov Chain

2008 ◽  
Vol 45 (02) ◽  
pp. 440-455
Author(s):  
Narjiss Touyar ◽  
Sophie Schbath ◽  
Dominique Cellier ◽  
Hélène Dauchel

Detection of repeated sequences within complete genomes is a powerful tool to help understanding genome dynamics and species evolutionary history. To distinguish significant repeats from those that can be obtained just by chance, statistical methods have to be developed. In this paper we show that the distribution of the number of long repeats in long sequences generated by stationary Markov chains can be approximated by a Poisson distribution with explicit parameter. Thanks to the Chen-Stein method we provide a bound for the approximation error; this bound converges to 0 as soon as the length n of the sequence tends to ∞ and the length t of the repeats satisfies n 2ρ t = O(1) for some 0 < ρ < 1. Using this Poisson approximation, p-values can then be easily calculated to determine if a given genome is significantly enriched in repeats of length t.

2008 ◽  
Vol 45 (2) ◽  
pp. 440-455 ◽  
Author(s):  
Narjiss Touyar ◽  
Sophie Schbath ◽  
Dominique Cellier ◽  
Hélène Dauchel

Detection of repeated sequences within complete genomes is a powerful tool to help understanding genome dynamics and species evolutionary history. To distinguish significant repeats from those that can be obtained just by chance, statistical methods have to be developed. In this paper we show that the distribution of the number of long repeats in long sequences generated by stationary Markov chains can be approximated by a Poisson distribution with explicit parameter. Thanks to the Chen-Stein method we provide a bound for the approximation error; this bound converges to 0 as soon as the length n of the sequence tends to ∞ and the length t of the repeats satisfies n2ρt = O(1) for some 0 < ρ < 1. Using this Poisson approximation, p-values can then be easily calculated to determine if a given genome is significantly enriched in repeats of length t.


2007 ◽  
Vol 39 (01) ◽  
pp. 128-140 ◽  
Author(s):  
Etienne Roquain ◽  
Sophie Schbath

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.


2007 ◽  
Vol 39 (1) ◽  
pp. 128-140 ◽  
Author(s):  
Etienne Roquain ◽  
Sophie Schbath

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.


2000 ◽  
Vol 37 (01) ◽  
pp. 101-117
Author(s):  
Torkel Erhardsson

We consider the uncovered set (i.e. the complement of the union of growing random intervals) in the one-dimensional Johnson-Mehl model. Let S(z,L) be the number of components of this set at time z &gt; 0 which intersect (0, L]. An explicit bound is known for the total variation distance between the distribution of S(z,L) and a Poisson distribution, but due to clumping of the components the bound can be rather large. We here give a bound for the total variation distance between the distribution of S(z,L) and a simple compound Poisson distribution (a Pólya-Aeppli distribution). The bound is derived by interpreting S(z,L) as the number of visits to a ‘rare’ set by a Markov chain, and applying results on compound Poisson approximation for Markov chains by Erhardsson. It is shown that under a mild condition, if z→∞ and L→∞ in a proper fashion, then both the Pólya-Aeppli and the Poisson approximation error bounds converge to 0, but the convergence of the former is much faster.


2000 ◽  
Vol 37 (1) ◽  
pp. 101-117 ◽  
Author(s):  
Torkel Erhardsson

We consider the uncovered set (i.e. the complement of the union of growing random intervals) in the one-dimensional Johnson-Mehl model. Let S(z,L) be the number of components of this set at time z > 0 which intersect (0, L]. An explicit bound is known for the total variation distance between the distribution of S(z,L) and a Poisson distribution, but due to clumping of the components the bound can be rather large. We here give a bound for the total variation distance between the distribution of S(z,L) and a simple compound Poisson distribution (a Pólya-Aeppli distribution). The bound is derived by interpreting S(z,L) as the number of visits to a ‘rare’ set by a Markov chain, and applying results on compound Poisson approximation for Markov chains by Erhardsson. It is shown that under a mild condition, if z→∞ and L→∞ in a proper fashion, then both the Pólya-Aeppli and the Poisson approximation error bounds converge to 0, but the convergence of the former is much faster.


2001 ◽  
Vol 10 (4) ◽  
pp. 293-308 ◽  
Author(s):  
OURANIA CHRYSSAPHINOU ◽  
STAVROS PAPASTAVRIDIS ◽  
EUTICHIA VAGGELATOU

Let X1, …, Xn be a sequence of r.v.s produced by a stationary Markov chain with state space an alphabet Ω = {ω1, …, ωq}, q [ges ] 2. We consider a set of words {A1, …, Ar}, r [ges ] 2, with letters from the alphabet Ω. We allow the words to have self-overlaps as well as overlaps between them. Let [Escr ] denote the event of the appearance of a word from the set {A1, …, Ar} at a given position. Moreover, define by N the number of non-overlapping (competing renewal) appearances of [Escr ] in the sequence X1, …, Xn. We derive a bound on the total variation distance between the distribution of N and a Poisson distribution with parameter [ ]N. The Stein–Chen method and combinatorial arguments concerning the structure of words are employed. As a corollary, we obtain an analogous result for the i.i.d. case. Furthermore, we prove that, under quite general conditions, the r.v. N converges in distribution to a Poisson r.v. A numerical example is presented to illustrate the performance of the bound in the Markov case.


1990 ◽  
Vol 27 (03) ◽  
pp. 545-556 ◽  
Author(s):  
S. Kalpazidou

The asymptotic behaviour of the sequence (𝒞 n (ω), wc,n (ω)/n), is studied where 𝒞 n (ω) is the class of all cycles c occurring along the trajectory ωof a recurrent strictly stationary Markov chain (ξ n ) until time n and wc,n (ω) is the number of occurrences of the cycle c until time n. The previous sequence of sample weighted classes converges almost surely to a class of directed weighted cycles (𝒞∞, ω c ) which represents uniquely the chain (ξ n ) as a circuit chain, and ω c is given a probabilistic interpretation.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Nikolaos Halidias

Abstract In this note we study the probability and the mean time for absorption for discrete time Markov chains. In particular, we are interested in estimating the mean time for absorption when absorption is not certain and connect it with some other known results. Computing a suitable probability generating function, we are able to estimate the mean time for absorption when absorption is not certain giving some applications concerning the random walk. Furthermore, we investigate the probability for a Markov chain to reach a set A before reach B generalizing this result for a sequence of sets A 1 , A 2 , … , A k {A_{1},A_{2},\dots,A_{k}} .


2021 ◽  
Author(s):  
Andrea Marin ◽  
Carla Piazza ◽  
Sabina Rossi

AbstractIn this paper, we deal with the lumpability approach to cope with the state space explosion problem inherent to the computation of the stationary performance indices of large stochastic models. The lumpability method is based on a state aggregation technique and applies to Markov chains exhibiting some structural regularity. Moreover, it allows one to efficiently compute the exact values of the stationary performance indices when the model is actually lumpable. The notion of quasi-lumpability is based on the idea that a Markov chain can be altered by relatively small perturbations of the transition rates in such a way that the new resulting Markov chain is lumpable. In this case, only upper and lower bounds on the performance indices can be derived. Here, we introduce a novel notion of quasi-lumpability, named proportional lumpability, which extends the original definition of lumpability but, differently from the general definition of quasi-lumpability, it allows one to derive exact stationary performance indices for the original process. We then introduce the notion of proportional bisimilarity for the terms of the performance process algebra PEPA. Proportional bisimilarity induces a proportional lumpability on the underlying continuous-time Markov chains. Finally, we prove some compositionality results and show the applicability of our theory through examples.


2004 ◽  
Vol 2004 (8) ◽  
pp. 421-429 ◽  
Author(s):  
Souad Assoudou ◽  
Belkheir Essebbar

This note is concerned with Bayesian estimation of the transition probabilities of a binary Markov chain observed from heterogeneous individuals. The model is founded on the Jeffreys' prior which allows for transition probabilities to be correlated. The Bayesian estimator is approximated by means of Monte Carlo Markov chain (MCMC) techniques. The performance of the Bayesian estimates is illustrated by analyzing a small simulated data set.


Sign in / Sign up

Export Citation Format

Share Document