scholarly journals Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

2008 ◽  
Vol DMTCS Proceedings vol. AI,... (Proceedings) ◽  
Author(s):  
Wojciech Szpankowski

International audience Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard's precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. This approach lies at the crossroad of computer science and information theory. In this survey we concentrate on one facet of information theory (i.e., source coding better known as data compression), namely the $\textit{redundancy rate}$ problem. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We further restrict our interest to the $\textit{average}$ redundancy for $\textit{known}$ sources, that is, when statistics of information sources are known. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variable-to-variable (VV) length codes. In particular, we investigate average redundancy of Huffman, Tunstall, and Khodak codes. These codes have succinct representations as $\textit{trees}$, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf).

2020 ◽  
Author(s):  
Anas Al-okaily ◽  
Abdelghani Tbakhi

ABSTRACTData compression is a fundamental problem in the fields of computer science, information theory, and coding theory. The need for compressing data is to reduce the size of the data so that the storage and the transmission of them become more efficient. Motivated from resolving the compression of DNA data, we introduce a novel encoding algorithm that works for any textual data including DNA data. Moreover, the design of this algorithm paves a novel approach so that researchers can build up on and resolve better the compression problem of DNA or textual data.


2010 ◽  
Vol DMTCS Proceedings vol. AM,... (Proceedings) ◽  
Author(s):  
Mark Daniel Ward

International audience The webpage of Herbert Wilf describes eight Unsolved Problems. Here, we completely resolve the third of these eight problems. The task seems innocent: find the first term of the asymptotic behavior of the coefficients of an ordinary generating function, whose coefficients naturally yield rational approximations to $\pi$. Upon closer examination, however, the analysis is fraught with difficulties. For instance, the function is the composition of three functions, but the innermost function has a non-zero constant term, so many standard techniques for analyzing function compositions will completely fail. Additionally, the signs of the coefficients are neither all positive, nor alternating in a regular manner. The generating function involves both a square root and an arctangent. The complex-valued square root and arctangent functions each rely on complex logarithms, which are multivalued and fundamentally depend on branch cuts. These multiple values and branch cuts make the function extremely tedious to visualize using Maple. We provide a complete asymptotic analysis of the coefficients of Wilf's generating function. The asymptotic expansion is naturally additive (not multiplicative); each term of the expansion contains oscillations, which we precisely characterize. The proofs rely on complex analysis, in particular, singularity analysis (which, in turn, rely on a Hankel contour and transfer theorems).


2010 ◽  
Vol DMTCS Proceedings vol. AM,... (Proceedings) ◽  
Author(s):  
Philippe Jacquet ◽  
Charles Knessl ◽  
Wojciech Szpankowski

International audience The method of types is one of the most popular techniques in information theory and combinatorics. Two sequences of equal length have the same type if they have identical empirical distributions. In this paper, we focus on Markov types, that is, sequences generated by a Markov source (of order one). We note that sequences having the same Markov type share the same so called $\textit{balanced frequency matrix}$ that counts the number of distinct pairs of symbols. We enumerate the number of Markov types for sequences of length $n$ over an alphabet of size $m$. This turns out to coincide with the number of the balanced frequency matrices as well as with the number of special $\textit{linear diophantine equations}$, and also balanced directed multigraphs. For fixed $m$ we prove that the number of Markov types is asymptotically equal to $d(m) \frac{n^{m^{2-m}}}{(m^2-m)!}$, where $d(m)$ is a constant for which we give an integral representation. For $m \to \infty$ we conclude that asymptotically the number of types is equivalent to $\frac{\sqrt{2}m^{3m/2} e^{m^2}}{m^{2m^2} 2^m \pi^{m/2}} n^{m^2-m}$ provided that $m=o(n^{1/4})$ (however, our techniques work for $m=o(\sqrt{n})$). These findings are derived by analytical techniques ranging from multidimensional generating functions to the saddle point method.


2012 ◽  
Vol DMTCS Proceedings vol. AQ,... (Proceedings) ◽  
Author(s):  
Philippe Jacquet ◽  
Wojciech Szpankowski

International audience String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define $\textit{joint string complexity}$ as the set of words that are common to both strings. We also relax this definition and introduce $\textit{joint semi-complexity}$ restricted to the common words appearing at least twice in both strings. String complexity finds a number of applications from capturing the richness of a language to finding similarities between two genome sequences. In this paper we analyze joint complexity and joint semi-complexity when both strings are generated by a Markov source. The problem turns out to be quite challenging requiring subtle singularity analysis and saddle point method over infinity many saddle points leading to novel oscillatory phenomena with single and double periodicities.


2010 ◽  
Vol 56 (4) ◽  
pp. 351-355
Author(s):  
Marcin Rodziewicz

Joint Source-Channel Coding in Dictionary Methods of Lossless Data Compression Limitations on memory and resources of communications systems require powerful data compression methods. Decompression of compressed data stream is very sensitive to errors which arise during transmission over noisy channels, therefore error correction coding is also required. One of the solutions to this problem is the application of joint source and channel coding. This paper contains a description of methods of joint source-channel coding based on the popular data compression algorithms LZ'77 and LZSS. These methods are capable of introducing some error resiliency into compressed stream of data without degradation of the compression ratio. We analyze joint source and channel coding algorithms based on these compression methods and present their novel extensions. We also present some simulation results showing usefulness and achievable quality of the analyzed algorithms.


Sign in / Sign up

Export Citation Format

Share Document