On Compound Poisson Approximation for Sequence Matching

2000 ◽  
Vol 9 (6) ◽  
pp. 529-548 ◽  
Author(s):  
MARIANNE MÅNSSON

Consider sequences {Xi}mi=1 and {Yj}nj=1 of independent random variables, taking values in a finite alphabet, and assume that the variables X1, X2, … and Y1, Y2, … follow the distributions μ and v, respectively. Two variables Xi and Yj are said to match if Xi = Yj. Let the number of matching subsequences of length k between the two sequences, when r, 0 [les ] r < k, mismatches are allowed, be denoted by W.In this paper we use Stein's method to bound the total variation distance between the distribution of W and a suitably chosen compound Poisson distribution. To derive rates of convergence, the case where E[W] stays bounded away from infinity, and the case where E[W] → ∞ as m, n → ∞, have to be treated separately. Under the assumption that ln n/ln(mn) → ρ ∈ (0, 1), we give conditions on the rate at which k → ∞, and on the distributions μ and v, for which the variation distance tends to zero.

2000 ◽  
Vol 37 (01) ◽  
pp. 101-117
Author(s):  
Torkel Erhardsson

We consider the uncovered set (i.e. the complement of the union of growing random intervals) in the one-dimensional Johnson-Mehl model. Let S(z,L) be the number of components of this set at time z &gt; 0 which intersect (0, L]. An explicit bound is known for the total variation distance between the distribution of S(z,L) and a Poisson distribution, but due to clumping of the components the bound can be rather large. We here give a bound for the total variation distance between the distribution of S(z,L) and a simple compound Poisson distribution (a Pólya-Aeppli distribution). The bound is derived by interpreting S(z,L) as the number of visits to a ‘rare’ set by a Markov chain, and applying results on compound Poisson approximation for Markov chains by Erhardsson. It is shown that under a mild condition, if z→∞ and L→∞ in a proper fashion, then both the Pólya-Aeppli and the Poisson approximation error bounds converge to 0, but the convergence of the former is much faster.


2000 ◽  
Vol 37 (1) ◽  
pp. 101-117 ◽  
Author(s):  
Torkel Erhardsson

We consider the uncovered set (i.e. the complement of the union of growing random intervals) in the one-dimensional Johnson-Mehl model. Let S(z,L) be the number of components of this set at time z > 0 which intersect (0, L]. An explicit bound is known for the total variation distance between the distribution of S(z,L) and a Poisson distribution, but due to clumping of the components the bound can be rather large. We here give a bound for the total variation distance between the distribution of S(z,L) and a simple compound Poisson distribution (a Pólya-Aeppli distribution). The bound is derived by interpreting S(z,L) as the number of visits to a ‘rare’ set by a Markov chain, and applying results on compound Poisson approximation for Markov chains by Erhardsson. It is shown that under a mild condition, if z→∞ and L→∞ in a proper fashion, then both the Pólya-Aeppli and the Poisson approximation error bounds converge to 0, but the convergence of the former is much faster.


2001 ◽  
Vol 38 (2) ◽  
pp. 449-463 ◽  
Author(s):  
Ourania Chryssaphinou ◽  
Eutichia Vaggelatou

Consider a sequence X1,…,Xn of independent random variables with the same continuous distribution and the event Xi-r+1 < ⋯ < Xi of the appearance of an increasing sequence with length r, for i=r,…,n. Denote by W the number of overlapping appearances of the above event in the sequence of n trials. In this work, we derive bounds for the total variation and Kolmogorov distances between the distribution of W and a suitable compound Poisson distribution. Via these bounds, an associated theorem concerning the limit distribution of W is obtained. Moreover, using the previous results we study the asymptotic behaviour of the length of the longest increasing sequence. Finally, we suggest a non-parametric test based on W for checking randomness against local increasing trend.


2001 ◽  
Vol 38 (02) ◽  
pp. 449-463
Author(s):  
Ourania Chryssaphinou ◽  
Eutichia Vaggelatou

Consider a sequence X 1,…,X n of independent random variables with the same continuous distribution and the event X i-r+1 &lt; ⋯ &lt; X i of the appearance of an increasing sequence with length r, for i=r,…,n. Denote by W the number of overlapping appearances of the above event in the sequence of n trials. In this work, we derive bounds for the total variation and Kolmogorov distances between the distribution of W and a suitable compound Poisson distribution. Via these bounds, an associated theorem concerning the limit distribution of W is obtained. Moreover, using the previous results we study the asymptotic behaviour of the length of the longest increasing sequence. Finally, we suggest a non-parametric test based on W for checking randomness against local increasing trend.


2007 ◽  
Vol 39 (01) ◽  
pp. 128-140 ◽  
Author(s):  
Etienne Roquain ◽  
Sophie Schbath

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.


2000 ◽  
Vol 32 (1) ◽  
pp. 19-38 ◽  
Author(s):  
A. D. Barbour ◽  
Marianne Månsson

Let n random points be uniformly and independently distributed in the unit square, and count the number W of subsets of k of the points which are covered by some translate of a small square C. If n|C| is small, the number of such clusters is approximately Poisson distributed, but the quality of the approximation is poor. In this paper, we show that the distribution of W can be much more closely approximated by an appropriate compound Poisson distribution CP(λ1, λ2,…). The argument is based on Stein's method, and is far from routine, largely because the approximating distribution does not satisfy the simplifying condition that iλi be decreasing.


2000 ◽  
Vol 32 (01) ◽  
pp. 19-38 ◽  
Author(s):  
A. D. Barbour ◽  
Marianne Månsson

Let n random points be uniformly and independently distributed in the unit square, and count the number W of subsets of k of the points which are covered by some translate of a small square C. If n|C| is small, the number of such clusters is approximately Poisson distributed, but the quality of the approximation is poor. In this paper, we show that the distribution of W can be much more closely approximated by an appropriate compound Poisson distribution CP(λ1, λ2,…). The argument is based on Stein's method, and is far from routine, largely because the approximating distribution does not satisfy the simplifying condition that iλ i be decreasing.


2010 ◽  
Vol 47 (3) ◽  
pp. 826-840 ◽  
Author(s):  
Katarzyna Rybarczyk ◽  
Dudley Stark

A random intersection graphG(n,m,p) is defined on a setVofnvertices. There is an auxiliary setWconsisting ofmobjects, and each vertexv∈Vis assigned a random subset of objectsWv⊆Wsuch thatw∈Wvwith probabilityp, independently for allv∈Vand allw∈W. Given two verticesv1,v2∈V, we setv1∼v2if and only ifWv1∩Wv2≠ ∅. We use Stein's method to obtain an upper bound on the total variation distance between the distribution of the number ofh-cliques inG(n,m,p) and a related Poisson distribution for any fixed integerh.


2010 ◽  
Vol 47 (03) ◽  
pp. 826-840 ◽  
Author(s):  
Katarzyna Rybarczyk ◽  
Dudley Stark

A random intersection graph G(n, m, p) is defined on a set V of n vertices. There is an auxiliary set W consisting of m objects, and each vertex v ∈ V is assigned a random subset of objects W v ⊆ W such that w ∈ W v with probability p, independently for all v ∈ V and all w ∈ W . Given two vertices v 1, v 2 ∈ V , we set v 1 ∼ v 2 if and only if W v 1 ∩ W v 2 ≠ ∅. We use Stein's method to obtain an upper bound on the total variation distance between the distribution of the number of h-cliques in G(n, m, p) and a related Poisson distribution for any fixed integer h.


Sign in / Sign up

Export Citation Format

Share Document