Strings with Maximally Many Distinct Subsequences and Substrings

A natural problem in extremal combinatorics is to maximize the number of distinct subsequences for any length-$n$ string over a finite alphabet $\Sigma$; this value grows exponentially, but slower than $2^n$. We use the probabilistic method to determine the maximizing string, which is a cyclically repeating string. The number of distinct subsequences is exactly enumerated by a generating function, from which we also derive asymptotic estimates. For the alphabet $\Sigma=\{1,2\}$, $\,(1,2,1,2,\dots)$ has the maximum number of distinct subsequences, namely ${\rm Fib}(n+3)-1 \sim \left((1+\sqrt5)/2\right)^{n+3} \! / \sqrt{5}$. We also consider the same problem with substrings in lieu of subsequences. Here, we show that an appropriately truncated de Bruijn word attains the maximum. For both problems, we compare the performance of random strings with that of the optimal ones.

Download Full-text

Sequence Enumeration and the de Bruijn-Van Aardenne Ehrenfest-Smith-Tutte Theorem

Canadian Journal of Mathematics ◽

10.4153/cjm-1979-054-x ◽

1979 ◽

Vol 31 (3) ◽

pp. 488-495 ◽

Cited By ~ 4

Author(s):

D. M. Jackson ◽

I. P. Goulden

Keyword(s):

Linear System ◽

Generating Function ◽

Power Series Expansion ◽

Finite Alphabet ◽

System Of Equations ◽

Implicit Functions ◽

Linear System Of Equations ◽

Lagrange Theorem ◽

Matrix Identity ◽

De Bruijn

The de Bruijn—van Aardenne Ehrenfest— Smith—Tutte theorem [1] is a theorem which connects the number of Eulerian dicircuits in a directed graph with the number of rooted spanning arborescences. In this paper we obtain a proof of this theorem by considering sequences over a finite alphabet, and we show that the theorem emerges from the generating function for a certain type of sequence. The generating function for the set of sequences is obtained as the solution of a linear system of equations in Section 2. The power series expansion for the solution of this system is obtained by means of the multivariate form of the Lagrange theorem for implicit functions, and is given in Section 3, together with a restatement of the theorem as a matrix identity.

Download Full-text

UNAVOIDABLE AND ALMOST UNAVOIDABLE SETS OF WORDS

International Journal of Algebra and Computation ◽

10.1142/s0218196705002463 ◽

2005 ◽

Vol 15 (04) ◽

pp. 717-724 ◽

Cited By ~ 1

Author(s):

JASON P. BELL

Keyword(s):

Automata Theory ◽

Finite Alphabet ◽

Asymptotic Estimates

A set of words over a finite alphabet is called an unavoidable set if every word of sufficiently long length must contain some word from this set as a subword. Motivated by a theorem from automata theory, we introduce the notion of an almost unavoidable set and prove certain asymptotic estimates for the size of almost unavoidable sets of uniform length.

Download Full-text

The Maximum Independent Sets of de Bruijn Graphs of Diameter 3

The Electronic Journal of Combinatorics ◽

10.37236/681 ◽

2011 ◽

Vol 18 (1) ◽

Author(s):

Dustin A. Cartwright ◽

María Angélica Cueto ◽

Enrique A. Tobis

Keyword(s):

Recurrence Relation ◽

Generating Function ◽

Independent Sets ◽

De Bruijn Graph ◽

Alphabet Size ◽

De Bruijn Graphs ◽

Exponential Generating Function ◽

De Bruijn

The nodes of the de Bruijn graph $B(d,3)$ consist of all strings of length $3$, taken from an alphabet of size $d$, with edges between words which are distinct substrings of a word of length $4$. We give an inductive characterization of the maximum independent sets of the de Bruijn graphs $B(d,3)$ and for the de Bruijn graph of diameter three with loops removed, for arbitrary alphabet size. We derive a recurrence relation and an exponential generating function for their number. This recurrence allows us to construct exponentially many comma-free codes of length 3 with maximal cardinality.

Download Full-text

ON FREE SPECTRA OF VARIETIES OF LOCALLY THRESHOLD TESTABLE SEMIGROUPS

International Journal of Algebra and Computation ◽

10.1142/s0218196712500555 ◽

2012 ◽

Vol 22 (06) ◽

pp. 1250055

Author(s):

IGOR DOLINKA

Keyword(s):

Asymptotic Formula ◽

Regular Language ◽

Finite Alphabet ◽

Combinatorial Interpretation ◽

De Bruijn Graphs ◽

Free Spectrum ◽

Enumeration Problem ◽

De Bruijn ◽

Asymptotic Upper Bound ◽

Free Spectra

A semigroup S is said to be ℓ-threshold k-testable if it satisfies all identities u = v where u, v is an arbitrary pair of words over a finite alphabet Σ such that they simultaneously belong or fail to belong to any ℓ-threshold k-testable (regular) language. We give an asymptotic formula for the free spectrum of the variety [Formula: see text] of all ℓ-threshold k-testable semigroups, thereby providing an asymptotic upper bound on the size of an arbitrary finitely generated locally threshold testable semigroup. The combinatorial interpretation of this task yields an enumeration problem for particular edge labelings of de Bruijn graphs.

Download Full-text

Counting Finite Languages by Total Word Length

Integers ◽

10.1515/integ.2011.068 ◽

2011 ◽

Vol 11 (6) ◽

Cited By ~ 3

Author(s):

Stefan Gerhold

Keyword(s):

Explicit Expression ◽

Generating Function ◽

Word Length ◽

Finite Alphabet ◽

Alphabet Size ◽

Total Length ◽

Large Alphabet ◽

Finite Language

AbstractWe investigate the number of sets of words that can be formed from a finite alphabet, counted by the total length of the words in the set. An explicit expression for the counting sequence is derived from the generating function, and asymptotics for large alphabet size and large total word length are discussed. Moreover, we derive a Gaussian limit law for the number of words in a random finite language.

Download Full-text

Developments in the Khintchine-Meinardus Probabilistic Method for Asymptotic Enumeration

The Electronic Journal of Combinatorics ◽

10.37236/4581 ◽

2015 ◽

Vol 22 (4) ◽

Cited By ~ 2

Author(s):

Boris L. Granovsky ◽

Dudley Stark

Keyword(s):

Generating Function ◽

Generating Functions ◽

Probabilistic Method ◽

Asymptotic Enumeration ◽

Combinatorial Objects ◽

Taylor Coefficients ◽

Gentile Statistics ◽

Novel Applications

A theorem of Meinardus provides asymptotics of the number of weighted partitions under certain assumptions on associated ordinary and Dirichlet generating functions. The ordinary generating functions are closely related to Euler's generating function $\prod_{k=1}^\infty S(z^k)$ for partitions, where $S(z)=(1-z)^{-1}$. By applying a method due to Khintchine, we extend Meinardus' theorem to find the asymptotics of the Taylor coefficients of generating functions of the form $\prod_{k=1}^\infty S(a_kz^k)^{b_k}$ for sequences $a_k$, $b_k$ and general $S(z)$. We also reformulate the hypotheses of the theorem in terms of the above generating functions. This allows novel applications of the method. In particular, we prove rigorously the asymptotics of Gentile statistics and derive the asymptotics of combinatorial objects with distinct components.

Download Full-text

Impulse Propagation in Compositions and Words

International Journal of Mathematics and Mathematical Sciences ◽

10.1155/2021/8811261 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Margaret Archibald ◽

Aubrey Blecher ◽

Charlotte Brennan ◽

Arnold Knopfmacher ◽

Toufik Mansour

Keyword(s):

Generating Function ◽

Finite Alphabet ◽

Impulse Propagation ◽

The Right

We consider compositions of n represented as bargraphs and subject these to repeated impulses which start from the left at the top level and destroy horizontally connected parts. This is repeated while moving to the right first and then downwards to the next row and the statistic of interest is the number of impulses needed to annihilate the whole composition. We achieve this by conceptualizing a generating function that tracks compositions as well as the number of impulses used. This conceptualization is repeated for words (over a finite alphabet) represented by bargraphs.

Download Full-text

Separation of the maxima in samples of geometric random variables

Applicable Analysis and Discrete Mathematics ◽

10.2298/aadm110817019b ◽

2011 ◽

Vol 5 (2) ◽

pp. 271-282 ◽

Cited By ~ 2

Author(s):

Charlotte Brennan ◽

Arnold Knopfmacher ◽

Toufik Mansour ◽

Stephan Wagner

Keyword(s):

Generating Function ◽

Probability Generating Function ◽

Random Variables ◽

Exact Formula ◽

Asymptotic Estimates ◽

Fixed Integer ◽

Minimum Separation ◽

Double Sum

We consider samples of n geometric random variables W1 W2 ... Wn where P{W) = i} = pqi-l, for 1 ? j ? n, with p + q = 1. For each fixed integer d > 0, we study the probability that the distance between the consecutive maxima in these samples is at least d. We derive a probability generating function for such samples and from it we obtain an exact formula for the probability as a double sum. Using Rice's method we obtain asymptotic estimates for these probabilities. As a consequence of these results, we determine the average minimum separation of the maxima, in a sample of n geometric random variables with at least two maxima.

Download Full-text

ALGORITHMIC COMBINATORICS ON PARTIAL WORDS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054112400473 ◽

2012 ◽

Vol 23 (06) ◽

pp. 1189-1206 ◽

Cited By ~ 1

Author(s):

F. BLANCHET-SADRI

Keyword(s):

Molecular Biology ◽

Data Compression ◽

Pattern Avoidance ◽

Finite Alphabet ◽

Open Problems ◽

The Past ◽

Partial Word ◽

De Bruijn ◽

Subword Complexity ◽

Partial Words

Algorithmic combinatorics on partial words, or sequences of symbols over a finite alphabet that may have some do-not-know symbols or holes, has been developing in the past few years. Applications can be found, for instance, in molecular biology for the sequencing and analysis of DNA, in bio-inspired computing where partial words have been considered for identifying good encodings for DNA computations, and in data compression. In this paper, we focus on two areas of algorithmic combinatorics on partial words, namely, pattern avoidance and subword complexity. We discuss recent contributions as well as a number of open problems. In relation to pattern avoidance, we classify all binary patterns with respect to partial word avoidability, we classify all unary patterns with respect to hole sparsity, and we discuss avoiding abelian powers in partial words. In relation to subword complexity, we generate and count minimal Sturmian partial words, we construct de Bruijn partial words, and we construct partial words with subword complexities not achievable by full words (those without holes).

Download Full-text

A Poisson * Geometric Convolution Law for the Number of Components in Unlabelled Combinatorial Structures

Combinatorics Probability Computing ◽

10.1017/s0963548397003295 ◽

1998 ◽

Vol 7 (1) ◽

pp. 89-110 ◽

Cited By ~ 1

Author(s):

HSIEN-KUEI HWANG

Keyword(s):

Probability Measure ◽

Finite Fields ◽

Generating Function ◽

Asymptotic Estimates ◽

Combinatorial Structures ◽

Number Of Components ◽

Random Mapping ◽

Uniform Probability ◽

Arithmetical Semigroups ◽

Precise Asymptotic

Given a class of combinatorial structures [Cscr ], we consider the quantity N(n, m), the number of multiset constructions [Pscr ] (of [Cscr ]) of size n having exactly m [Cscr ]-components. Under general analytic conditions on the generating function of [Cscr ], we derive precise asymptotic estimates for N(n, m), as n→∞ and m varies through all possible values (in general 1[les ]m[les ]n). In particular, we show that the number of [Cscr ]-components in a random (assuming a uniform probability measure) [Pscr ]-structure of size n obeys asymptotically a convolution law of the Poisson and the geometric distributions. Applications of the results include random mapping patterns, polynomials in finite fields, parameters in additive arithmetical semigroups, etc. This work develops the ‘additive’ counterpart of our previous work on the distribution of the number of prime factors of an integer [20].

Download Full-text