Analysis of the multiplicity matching parameter in suffix trees

Mark Daniel Ward; Wojciech Szpankowski

doi:10.46298/dmtcs.3387

Analysis of the multiplicity matching parameter in suffix trees

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.3387 ◽

2005 ◽

Vol DMTCS Proceedings vol. AD,... (Proceedings) ◽

Cited By ~ 1

Author(s):

Mark Daniel Ward ◽

Wojciech Szpankowski

Keyword(s):

Suffix Tree ◽

Recurrence Relations ◽

Complex Analysis ◽

Analysis Of Algorithms ◽

Suffix Trees ◽

Logarithmic Series Distribution ◽

International Audience ◽

Number Of Leaves ◽

Branching Point ◽

Logarithmic Series

International audience In a suffix tree, the multiplicity matching parameter (MMP) $M_n$ is the number of leaves in the subtree rooted at the branching point of the $(n+1)$st insertion. Equivalently, the MMP is the number of pointers into the database in the Lempel-Ziv '77 data compression algorithm. We prove that the MMP asymptotically follows the logarithmic series distribution plus some fluctuations. In the proof we compare the distribution of the MMP in suffix trees to its distribution in tries built over independent strings. Our results are derived by both probabilistic and analytic techniques of the analysis of algorithms. In particular, we utilize combinatorics on words, bivariate generating functions, pattern matching, recurrence relations, analytical poissonization and depoissonization, the Mellin transform, and complex analysis.

Download Full-text

Complete k-ary trees and generalized meta-Fibonacci sequences

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.3514 ◽

2006 ◽

Vol DMTCS Proceedings vol. AG,... (Proceedings) ◽

Author(s):

Chris Deugau ◽

Frank Ruskey

Keyword(s):

Generating Functions ◽

Recurrence Relations ◽

International Audience ◽

Number Of Leaves ◽

Fibonacci Sequences ◽

Infinite Sequences

International audience We show that a family of generalized meta-Fibonacci sequences arise when counting the number of leaves at the largest level in certain infinite sequences of k-ary trees and restricted compositions of an integer. For this family of generalized meta-Fibonacci sequences and two families of related sequences we derive ordinary generating functions and recurrence relations.

Download Full-text

On size-biased Consul distribution

Mathematica Slovaca ◽

10.2478/s12175-011-0036-z ◽

2011 ◽

Vol 61 (4) ◽

Author(s):

Khurshid Mir

Keyword(s):

Comparative Analysis ◽

Recurrence Relations ◽

Estimation Methods ◽

Central Moments ◽

Proposed Model ◽

Generalized Logarithmic Series Distribution ◽

Logarithmic Series Distribution ◽

Logarithmic Series

AbstractIn this paper, a size-biased Consul distribution (SBCOND) is defined. Recurrence relations for central moments and the moments about origin are obtained. Different estimation methods for the parameters of the model are discussed. A comparative analysis is done among the three different estimation methods and the proposed model is compared with the generalized logarithmic series distribution (GLSD) and simple Consul distribution.

Download Full-text

Analysis of the average depth in a suffix tree under a Markov model

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.3371 ◽

2005 ◽

Vol DMTCS Proceedings vol. AD,... (Proceedings) ◽

Author(s):

Julien Fayolle ◽

Mark Daniel Ward

Keyword(s):

Asymptotic Behavior ◽

Markov Model ◽

Generating Functions ◽

Suffix Tree ◽

Average Depth ◽

Markovian Model ◽

Suffix Trees ◽

Digital Trees ◽

International Audience ◽

The Difference

International audience In this report, we prove that under a Markovian model of order one, the average depth of suffix trees of index n is asymptotically similar to the average depth of tries (a.k.a. digital trees) built on n independent strings. This leads to an asymptotic behavior of $(\log{n})/h + C$ for the average of the depth of the suffix tree, where $h$ is the entropy of the Markov model and $C$ is constant. Our proof compares the generating functions for the average depth in tries and in suffix trees; the difference between these generating functions is shown to be asymptotically small. We conclude by using the asymptotic behavior of the average depth in a trie under the Markov model found by Jacquet and Szpankowski ([JaSz91]).

Download Full-text

A Note on the Multivariate Logarithmic Series Distribution

Communications in Statistics - Simulation and Computation ◽

10.1080/03610917408548394 ◽

1974 ◽

Vol 3 (5) ◽

pp. 469-472

Author(s):

Andreas Philippou ◽

George Roussas

Keyword(s):

Logarithmic Series Distribution ◽

Logarithmic Series

Download Full-text

Moments of the bivariate logarithmic series distribution

Scandinavian Actuarial Journal ◽

10.1080/03461238.1970.10405640 ◽

1970 ◽

Vol 1970 (1-2) ◽

pp. 1-5

Author(s):

J. K. Wani

Keyword(s):

Logarithmic Series Distribution ◽

Logarithmic Series

Download Full-text

Logarithmic Series Distribution

Handbook of Statistical Distributions with Applications ◽

10.1201/9781420011371-10 ◽

2006 ◽

pp. 131-138

Keyword(s):

Logarithmic Series Distribution ◽

Logarithmic Series

Download Full-text

Meta-Fibonacci Sequences, Binary Trees and Extremal Compact Codes

The Electronic Journal of Combinatorics ◽

10.37236/1052 ◽

2006 ◽

Vol 13 (1) ◽

Cited By ~ 8

Author(s):

Brad Jackson ◽

Frank Ruskey

Keyword(s):

Generating Functions ◽

Recurrence Relations ◽

Binary Trees ◽

Integer Sequences ◽

Number Of Leaves ◽

Online Encyclopedia ◽

Fibonacci Sequences ◽

Infinite Sequences

We consider a family of meta-Fibonacci sequences which arise in studying the number of leaves at the largest level in certain infinite sequences of binary trees, restricted compositions of an integer, and binary compact codes. For this family of meta-Fibonacci sequences and two families of related sequences we derive ordinary generating functions and recurrence relations. Included in these families of sequences are several well-known sequences in the Online Encyclopedia of Integer Sequences (OEIS).

Download Full-text

On Methods of Estimation for Generalized Logarithmic Series Distribution and Its Application to Counts of Red Mites on Apple Leaves

Indian Journal of Pure & Applied Biosciences ◽

10.18782/2582-2845.8689 ◽

2021 ◽

Vol 9 (3) ◽

pp. 151-155

Author(s):

Fehim J Wani ◽

Keyword(s):

Maximum Likelihood ◽

Estimation Of Parameters ◽

Chi Square ◽

Estimation Techniques ◽

Apple Leaves ◽

Generalized Logarithmic Series Distribution ◽

Logarithmic Series Distribution ◽

Logarithmic Series ◽

Extra Parameter

The Generalized Logarithmic Series Distribution (GLSD) adds an extra parameter to the usual logarithmic series distribution and was introduced by Jain and Gupta (1973). This distribution has found applications in various fields. The estimation of parameters of generalized logarithmic series distribution was studied by the methods of maximum likelihood, moments, minimum chi square and weighted discrepancies. The GLSD was fitted to counts of red mites on apple leaves and it was observed that all the estimation techniques perform well in estimating the parameters of generalized logarithmic series distribution but with varying degree of non-significance.

Download Full-text

On the Number of 2-Protected Nodes in Tries and Suffix Trees

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.3008 ◽

2012 ◽

Vol DMTCS Proceedings vol. AQ,... (Proceedings) ◽

Author(s):

Jeffrey Gaither ◽

Yushi Homma ◽

Mark Sellke ◽

Mark Daniel Ward

Keyword(s):

Suffix Trees ◽

First Order ◽

International Audience

International audience We use probabilistic and combinatorial tools on strings to discover the average number of 2-protected nodes in tries and in suffix trees. Our analysis covers both the uniform and non-uniform cases. For instance, in a uniform trie with $n$ leaves, the number of 2-protected nodes is approximately 0.803$n$, plus small first-order fluctuations. The 2-protected nodes are an emerging way to distinguish the interior of a tree from the fringe.

Download Full-text

Suffix Tree Data Structures for Matrices

Pattern Matching Algorithms ◽

10.1093/oso/9780195113679.003.0013 ◽

1997 ◽

Author(s):

R. Giancarlo ◽

R. Grossi

Keyword(s):

Linear Space ◽

Suffix Tree ◽

Linear Time ◽

Suffix Trees ◽

Construction Time ◽

Matching Problems ◽

Tree Construction ◽

The Matrix ◽

Visual Databases ◽

Efficient Construction

We discuss the suffix tree generalization to matrices in this chapter. We extend the suffix tree notion (described in Chapter 3) from text strings to text matrices whose entries are taken from an ordered alphabet with the aim of solving pattern-matching problems. This suffix tree generalization can be efficiently used to implement low-level routines for Computer Vision, Data Compression, Geographic Information Systems and Visual Databases. We examine the submatrices in the form of the text’s contiguous parts that still have a matrix shape. Representing these text submatrices as “suitably formatted” strings stored in a compacted trie is the rationale behind suffix trees for matrices. The choice of the format inevitably influences suffix tree construction time and space complexity. We first deal with square matrices and show that many suffix tree families can be defined for the same input matrix according to the matrix’s string representations. We can store each suffix tree in linear space and give an efficient construction algorithm whose input is both the matrix and the string representation chosen. We then treat rectangular matrices and define their corresponding suffix trees by means of some general rules which we list formally. We show that there is a super-linear lower bound to the space required (in contrast with the linear space required by suffix trees for square matrices). We give a simple example of one of these suffix trees. The last part of the chapter illustrates some technical results regarding suffix trees for square matrices: we show how to achieve an expected linear-time suffix tree construction for a constant-size alphabet under some mild probabilistic assumptions about the input distribution. We begin by defining a wide class of string representations for square matrices. We let Σ denote an ordered alphabet of characters and introduce another alphabet of five special characters, called shapes. A shape is one of the special characters taken from set {IN,SW,NW,SE,NE}. Shape IN encodes the 1x1 matrix generated from the empty matrix by creating a square.

Download Full-text