scholarly journals Strings with Maximally Many Distinct Subsequences and Substrings

10.37236/1761 ◽  
2004 ◽  
Vol 11 (1) ◽  
Author(s):  
Abraham Flaxman ◽  
Aram W. Harrow ◽  
Gregory B. Sorkin

A natural problem in extremal combinatorics is to maximize the number of distinct subsequences for any length-$n$ string over a finite alphabet $\Sigma$; this value grows exponentially, but slower than $2^n$. We use the probabilistic method to determine the maximizing string, which is a cyclically repeating string. The number of distinct subsequences is exactly enumerated by a generating function, from which we also derive asymptotic estimates. For the alphabet $\Sigma=\{1,2\}$, $\,(1,2,1,2,\dots)$ has the maximum number of distinct subsequences, namely ${\rm Fib}(n+3)-1 \sim \left((1+\sqrt5)/2\right)^{n+3} \! / \sqrt{5}$. We also consider the same problem with substrings in lieu of subsequences. Here, we show that an appropriately truncated de Bruijn word attains the maximum. For both problems, we compare the performance of random strings with that of the optimal ones.

1979 ◽  
Vol 31 (3) ◽  
pp. 488-495 ◽  
Author(s):  
D. M. Jackson ◽  
I. P. Goulden

The de Bruijn—van Aardenne Ehrenfest— Smith—Tutte theorem [1] is a theorem which connects the number of Eulerian dicircuits in a directed graph with the number of rooted spanning arborescences. In this paper we obtain a proof of this theorem by considering sequences over a finite alphabet, and we show that the theorem emerges from the generating function for a certain type of sequence. The generating function for the set of sequences is obtained as the solution of a linear system of equations in Section 2. The power series expansion for the solution of this system is obtained by means of the multivariate form of the Lagrange theorem for implicit functions, and is given in Section 3, together with a restatement of the theorem as a matrix identity.


2005 ◽  
Vol 15 (04) ◽  
pp. 717-724 ◽  
Author(s):  
JASON P. BELL

A set of words over a finite alphabet is called an unavoidable set if every word of sufficiently long length must contain some word from this set as a subword. Motivated by a theorem from automata theory, we introduce the notion of an almost unavoidable set and prove certain asymptotic estimates for the size of almost unavoidable sets of uniform length.


10.37236/681 ◽  
2011 ◽  
Vol 18 (1) ◽  
Author(s):  
Dustin A. Cartwright ◽  
María Angélica Cueto ◽  
Enrique A. Tobis

The nodes of the de Bruijn graph $B(d,3)$ consist of all strings of length $3$, taken from an alphabet of size $d$, with edges between words which are distinct substrings of a word of length $4$. We give an inductive characterization of the maximum independent sets of the de Bruijn graphs $B(d,3)$ and for the de Bruijn graph of diameter three with loops removed, for arbitrary alphabet size. We derive a recurrence relation and an exponential generating function for their number. This recurrence allows us to construct exponentially many comma-free codes of length 3 with maximal cardinality.


2012 ◽  
Vol 22 (06) ◽  
pp. 1250055
Author(s):  
IGOR DOLINKA

A semigroup S is said to be ℓ-threshold k-testable if it satisfies all identities u = v where u, v is an arbitrary pair of words over a finite alphabet Σ such that they simultaneously belong or fail to belong to any ℓ-threshold k-testable (regular) language. We give an asymptotic formula for the free spectrum of the variety [Formula: see text] of all ℓ-threshold k-testable semigroups, thereby providing an asymptotic upper bound on the size of an arbitrary finitely generated locally threshold testable semigroup. The combinatorial interpretation of this task yields an enumeration problem for particular edge labelings of de Bruijn graphs.


Integers ◽  
2011 ◽  
Vol 11 (6) ◽  
Author(s):  
Stefan Gerhold

AbstractWe investigate the number of sets of words that can be formed from a finite alphabet, counted by the total length of the words in the set. An explicit expression for the counting sequence is derived from the generating function, and asymptotics for large alphabet size and large total word length are discussed. Moreover, we derive a Gaussian limit law for the number of words in a random finite language.


10.37236/4581 ◽  
2015 ◽  
Vol 22 (4) ◽  
Author(s):  
Boris L. Granovsky ◽  
Dudley Stark

A theorem of Meinardus provides asymptotics of the number of weighted partitions under certain assumptions on associated ordinary and Dirichlet generating functions. The ordinary generating functions are closely related to Euler's generating function $\prod_{k=1}^\infty S(z^k)$ for partitions, where $S(z)=(1-z)^{-1}$. By applying a method due to Khintchine, we extend Meinardus' theorem to find the asymptotics of the Taylor coefficients of generating functions of the form $\prod_{k=1}^\infty S(a_kz^k)^{b_k}$ for sequences $a_k$, $b_k$ and general $S(z)$. We also reformulate the hypotheses of the theorem in terms of the above generating functions. This allows novel applications of the method. In particular, we prove rigorously the asymptotics of Gentile statistics and derive the asymptotics of combinatorial objects with distinct components.


Author(s):  
Margaret Archibald ◽  
Aubrey Blecher ◽  
Charlotte Brennan ◽  
Arnold Knopfmacher ◽  
Toufik Mansour

We consider compositions of n represented as bargraphs and subject these to repeated impulses which start from the left at the top level and destroy horizontally connected parts. This is repeated while moving to the right first and then downwards to the next row and the statistic of interest is the number of impulses needed to annihilate the whole composition. We achieve this by conceptualizing a generating function that tracks compositions as well as the number of impulses used. This conceptualization is repeated for words (over a finite alphabet) represented by bargraphs.


2011 ◽  
Vol 5 (2) ◽  
pp. 271-282 ◽  
Author(s):  
Charlotte Brennan ◽  
Arnold Knopfmacher ◽  
Toufik Mansour ◽  
Stephan Wagner

We consider samples of n geometric random variables W1 W2 ... Wn where P{W) = i} = pqi-l, for 1 ? j ? n, with p + q = 1. For each fixed integer d > 0, we study the probability that the distance between the consecutive maxima in these samples is at least d. We derive a probability generating function for such samples and from it we obtain an exact formula for the probability as a double sum. Using Rice's method we obtain asymptotic estimates for these probabilities. As a consequence of these results, we determine the average minimum separation of the maxima, in a sample of n geometric random variables with at least two maxima.


2012 ◽  
Vol 23 (06) ◽  
pp. 1189-1206 ◽  
Author(s):  
F. BLANCHET-SADRI

Algorithmic combinatorics on partial words, or sequences of symbols over a finite alphabet that may have some do-not-know symbols or holes, has been developing in the past few years. Applications can be found, for instance, in molecular biology for the sequencing and analysis of DNA, in bio-inspired computing where partial words have been considered for identifying good encodings for DNA computations, and in data compression. In this paper, we focus on two areas of algorithmic combinatorics on partial words, namely, pattern avoidance and subword complexity. We discuss recent contributions as well as a number of open problems. In relation to pattern avoidance, we classify all binary patterns with respect to partial word avoidability, we classify all unary patterns with respect to hole sparsity, and we discuss avoiding abelian powers in partial words. In relation to subword complexity, we generate and count minimal Sturmian partial words, we construct de Bruijn partial words, and we construct partial words with subword complexities not achievable by full words (those without holes).


1998 ◽  
Vol 7 (1) ◽  
pp. 89-110 ◽  
Author(s):  
HSIEN-KUEI HWANG

Given a class of combinatorial structures [Cscr ], we consider the quantity N(n, m), the number of multiset constructions [Pscr ] (of [Cscr ]) of size n having exactly m [Cscr ]-components. Under general analytic conditions on the generating function of [Cscr ], we derive precise asymptotic estimates for N(n, m), as n→∞ and m varies through all possible values (in general 1[les ]m[les ]n). In particular, we show that the number of [Cscr ]-components in a random (assuming a uniform probability measure) [Pscr ]-structure of size n obeys asymptotically a convolution law of the Poisson and the geometric distributions. Applications of the results include random mapping patterns, polynomials in finite fields, parameters in additive arithmetical semigroups, etc. This work develops the ‘additive’ counterpart of our previous work on the distribution of the number of prime factors of an integer [20].


Sign in / Sign up

Export Citation Format

Share Document