scholarly journals Improved compound Poisson approximation for the number of occurrences of any rare word family in a stationary markov chain

2007 ◽  
Vol 39 (01) ◽  
pp. 128-140 ◽  
Author(s):  
Etienne Roquain ◽  
Sophie Schbath

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.

2007 ◽  
Vol 39 (1) ◽  
pp. 128-140 ◽  
Author(s):  
Etienne Roquain ◽  
Sophie Schbath

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.


2008 ◽  
Vol 45 (02) ◽  
pp. 440-455
Author(s):  
Narjiss Touyar ◽  
Sophie Schbath ◽  
Dominique Cellier ◽  
Hélène Dauchel

Detection of repeated sequences within complete genomes is a powerful tool to help understanding genome dynamics and species evolutionary history. To distinguish significant repeats from those that can be obtained just by chance, statistical methods have to be developed. In this paper we show that the distribution of the number of long repeats in long sequences generated by stationary Markov chains can be approximated by a Poisson distribution with explicit parameter. Thanks to the Chen-Stein method we provide a bound for the approximation error; this bound converges to 0 as soon as the length n of the sequence tends to ∞ and the length t of the repeats satisfies n 2ρ t = O(1) for some 0 < ρ < 1. Using this Poisson approximation, p-values can then be easily calculated to determine if a given genome is significantly enriched in repeats of length t.


2000 ◽  
Vol 37 (01) ◽  
pp. 101-117
Author(s):  
Torkel Erhardsson

We consider the uncovered set (i.e. the complement of the union of growing random intervals) in the one-dimensional Johnson-Mehl model. Let S(z,L) be the number of components of this set at time z > 0 which intersect (0, L]. An explicit bound is known for the total variation distance between the distribution of S(z,L) and a Poisson distribution, but due to clumping of the components the bound can be rather large. We here give a bound for the total variation distance between the distribution of S(z,L) and a simple compound Poisson distribution (a Pólya-Aeppli distribution). The bound is derived by interpreting S(z,L) as the number of visits to a ‘rare’ set by a Markov chain, and applying results on compound Poisson approximation for Markov chains by Erhardsson. It is shown that under a mild condition, if z→∞ and L→∞ in a proper fashion, then both the Pólya-Aeppli and the Poisson approximation error bounds converge to 0, but the convergence of the former is much faster.


2008 ◽  
Vol 45 (2) ◽  
pp. 440-455 ◽  
Author(s):  
Narjiss Touyar ◽  
Sophie Schbath ◽  
Dominique Cellier ◽  
Hélène Dauchel

Detection of repeated sequences within complete genomes is a powerful tool to help understanding genome dynamics and species evolutionary history. To distinguish significant repeats from those that can be obtained just by chance, statistical methods have to be developed. In this paper we show that the distribution of the number of long repeats in long sequences generated by stationary Markov chains can be approximated by a Poisson distribution with explicit parameter. Thanks to the Chen-Stein method we provide a bound for the approximation error; this bound converges to 0 as soon as the length n of the sequence tends to ∞ and the length t of the repeats satisfies n2ρt = O(1) for some 0 < ρ < 1. Using this Poisson approximation, p-values can then be easily calculated to determine if a given genome is significantly enriched in repeats of length t.


2000 ◽  
Vol 37 (1) ◽  
pp. 101-117 ◽  
Author(s):  
Torkel Erhardsson

We consider the uncovered set (i.e. the complement of the union of growing random intervals) in the one-dimensional Johnson-Mehl model. Let S(z,L) be the number of components of this set at time z > 0 which intersect (0, L]. An explicit bound is known for the total variation distance between the distribution of S(z,L) and a Poisson distribution, but due to clumping of the components the bound can be rather large. We here give a bound for the total variation distance between the distribution of S(z,L) and a simple compound Poisson distribution (a Pólya-Aeppli distribution). The bound is derived by interpreting S(z,L) as the number of visits to a ‘rare’ set by a Markov chain, and applying results on compound Poisson approximation for Markov chains by Erhardsson. It is shown that under a mild condition, if z→∞ and L→∞ in a proper fashion, then both the Pólya-Aeppli and the Poisson approximation error bounds converge to 0, but the convergence of the former is much faster.


2000 ◽  
Vol 9 (6) ◽  
pp. 529-548 ◽  
Author(s):  
MARIANNE MÅNSSON

Consider sequences {Xi}mi=1 and {Yj}nj=1 of independent random variables, taking values in a finite alphabet, and assume that the variables X1, X2, … and Y1, Y2, … follow the distributions μ and v, respectively. Two variables Xi and Yj are said to match if Xi = Yj. Let the number of matching subsequences of length k between the two sequences, when r, 0 [les ] r < k, mismatches are allowed, be denoted by W.In this paper we use Stein's method to bound the total variation distance between the distribution of W and a suitably chosen compound Poisson distribution. To derive rates of convergence, the case where E[W] stays bounded away from infinity, and the case where E[W] → ∞ as m, n → ∞, have to be treated separately. Under the assumption that ln n/ln(mn) → ρ ∈ (0, 1), we give conditions on the rate at which k → ∞, and on the distributions μ and v, for which the variation distance tends to zero.


2000 ◽  
Vol 32 (1) ◽  
pp. 19-38 ◽  
Author(s):  
A. D. Barbour ◽  
Marianne Månsson

Let n random points be uniformly and independently distributed in the unit square, and count the number W of subsets of k of the points which are covered by some translate of a small square C. If n|C| is small, the number of such clusters is approximately Poisson distributed, but the quality of the approximation is poor. In this paper, we show that the distribution of W can be much more closely approximated by an appropriate compound Poisson distribution CP(λ1, λ2,…). The argument is based on Stein's method, and is far from routine, largely because the approximating distribution does not satisfy the simplifying condition that iλi be decreasing.


1994 ◽  
Vol 31 (A) ◽  
pp. 271-281 ◽  
Author(s):  
Joseph Glaz ◽  
Joseph Naus ◽  
Malgorzata Roos ◽  
Sylvan Wallenstein

This article investigates the accuracy of approximations for the distribution of ordered m-spacings for i.i.d. uniform observations in the interval (0, 1). Several Poisson approximations and a compound Poisson approximation are studied. The result of a simulation study is included to assess the accuracy of these approximations. A numerical procedure for evaluating the moments of the ordered m-spacings is developed and evaluated for the most accurate approximation.


2000 ◽  
Vol 32 (01) ◽  
pp. 19-38 ◽  
Author(s):  
A. D. Barbour ◽  
Marianne Månsson

Let n random points be uniformly and independently distributed in the unit square, and count the number W of subsets of k of the points which are covered by some translate of a small square C. If n|C| is small, the number of such clusters is approximately Poisson distributed, but the quality of the approximation is poor. In this paper, we show that the distribution of W can be much more closely approximated by an appropriate compound Poisson distribution CP(λ1, λ2,…). The argument is based on Stein's method, and is far from routine, largely because the approximating distribution does not satisfy the simplifying condition that iλ i be decreasing.


2001 ◽  
Vol 38 (2) ◽  
pp. 449-463 ◽  
Author(s):  
Ourania Chryssaphinou ◽  
Eutichia Vaggelatou

Consider a sequence X1,…,Xn of independent random variables with the same continuous distribution and the event Xi-r+1 < ⋯ < Xi of the appearance of an increasing sequence with length r, for i=r,…,n. Denote by W the number of overlapping appearances of the above event in the sequence of n trials. In this work, we derive bounds for the total variation and Kolmogorov distances between the distribution of W and a suitable compound Poisson distribution. Via these bounds, an associated theorem concerning the limit distribution of W is obtained. Moreover, using the previous results we study the asymptotic behaviour of the length of the longest increasing sequence. Finally, we suggest a non-parametric test based on W for checking randomness against local increasing trend.


Sign in / Sign up

Export Citation Format

Share Document