Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

Wojciech Szpankowski

doi:10.46298/dmtcs.3555

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.3555 ◽

2008 ◽

Vol DMTCS Proceedings vol. AI,... (Proceedings) ◽

Author(s):

Wojciech Szpankowski

Keyword(s):

Information Theory ◽

Data Compression ◽

Computer Science ◽

Complex Analysis ◽

Source Coding ◽

Singularity Analysis ◽

Saddle Point Method ◽

Code Length ◽

International Audience ◽

Lossless Data Compression

International audience Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard's precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. This approach lies at the crossroad of computer science and information theory. In this survey we concentrate on one facet of information theory (i.e., source coding better known as data compression), namely the $\textit{redundancy rate}$ problem. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We further restrict our interest to the $\textit{average}$ redundancy for $\textit{known}$ sources, that is, when statistics of information sources are known. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variable-to-variable (VV) length codes. In particular, we investigate average redundancy of Huffman, Tunstall, and Khodak codes. These codes have succinct representations as $\textit{trees}$, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf).

Download Full-text

Source coding and lossless data compression

Digital communications: Principles and systems ◽

10.1049/pbte058e_ch6 ◽

2014 ◽

pp. 193-232

Keyword(s):

Data Compression ◽

Source Coding ◽

Lossless Data Compression

Download Full-text

A Novel Encoding Algorithm for Textual Data Compression

10.1101/2020.08.24.264366 ◽

2020 ◽

Author(s):

Anas Al-okaily ◽

Abdelghani Tbakhi

Keyword(s):

Information Theory ◽

Data Compression ◽

Computer Science ◽

Coding Theory ◽

Fundamental Problem ◽

Novel Approach ◽

Textual Data ◽

Science Information

ABSTRACTData compression is a fundamental problem in the fields of computer science, information theory, and coding theory. The need for compressing data is to reduce the size of the data so that the storage and the transmission of them become more efficient. Motivated from resolving the compression of DNA data, we introduce a novel encoding algorithm that works for any textual data including DNA data. Moreover, the design of this algorithm paves a novel approach so that researchers can build up on and resolve better the compression problem of DNA or textual data.

Download Full-text

Source Coding at the Edge: User Preference Oriented Lossless Data Compression

ICC 2019 - 2019 IEEE International Conference on Communications (ICC) ◽

10.1109/icc.2019.8761522 ◽

2019 ◽

Author(s):

Yawei Lu ◽

Wei Chen ◽

H. Vincent Poor

Keyword(s):

Data Compression ◽

Source Coding ◽

User Preference ◽

Lossless Data Compression

Download Full-text

Asymptotic Rational Approximation To Pi: Solution of an "Unsolved Problem'' Posed By Herbert Wilf

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2777 ◽

2010 ◽

Vol DMTCS Proceedings vol. AM,... (Proceedings) ◽

Author(s):

Mark Daniel Ward

Keyword(s):

Generating Function ◽

Complex Analysis ◽

Constant Term ◽

Singularity Analysis ◽

Square Root ◽

International Audience ◽

Regular Manner ◽

Complex Valued ◽

Branch Cuts ◽

Transfer Theorems

International audience The webpage of Herbert Wilf describes eight Unsolved Problems. Here, we completely resolve the third of these eight problems. The task seems innocent: find the first term of the asymptotic behavior of the coefficients of an ordinary generating function, whose coefficients naturally yield rational approximations to $\pi$. Upon closer examination, however, the analysis is fraught with difficulties. For instance, the function is the composition of three functions, but the innermost function has a non-zero constant term, so many standard techniques for analyzing function compositions will completely fail. Additionally, the signs of the coefficients are neither all positive, nor alternating in a regular manner. The generating function involves both a square root and an arctangent. The complex-valued square root and arctangent functions each rely on complex logarithms, which are multivalued and fundamentally depend on branch cuts. These multiple values and branch cuts make the function extremely tedious to visualize using Maple. We provide a complete asymptotic analysis of the coefficients of Wilf's generating function. The asymptotic expansion is naturally additive (not multiplicative); each term of the expansion contains oscillations, which we precisely characterize. The proofs rely on complex analysis, in particular, singularity analysis (which, in turn, rely on a Hankel contour and transfer theorems).

Download Full-text

Counting Markov Types

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2768 ◽

2010 ◽

Vol DMTCS Proceedings vol. AM,... (Proceedings) ◽

Author(s):

Philippe Jacquet ◽

Charles Knessl ◽

Wojciech Szpankowski

Keyword(s):

Information Theory ◽

Integral Representation ◽

Generating Functions ◽

Diophantine Equations ◽

Analytical Techniques ◽

Saddle Point Method ◽

Empirical Distributions ◽

Linear Diophantine Equations ◽

Markov Source ◽

International Audience

International audience The method of types is one of the most popular techniques in information theory and combinatorics. Two sequences of equal length have the same type if they have identical empirical distributions. In this paper, we focus on Markov types, that is, sequences generated by a Markov source (of order one). We note that sequences having the same Markov type share the same so called $\textit{balanced frequency matrix}$ that counts the number of distinct pairs of symbols. We enumerate the number of Markov types for sequences of length $n$ over an alphabet of size $m$. This turns out to coincide with the number of the balanced frequency matrices as well as with the number of special $\textit{linear diophantine equations}$, and also balanced directed multigraphs. For fixed $m$ we prove that the number of Markov types is asymptotically equal to $d(m) \frac{n^{m^{2-m}}}{(m^2-m)!}$, where $d(m)$ is a constant for which we give an integral representation. For $m \to \infty$ we conclude that asymptotically the number of types is equivalent to $\frac{\sqrt{2}m^{3m/2} e^{m^2}}{m^{2m^2} 2^m \pi^{m/2}} n^{m^2-m}$ provided that $m=o(n^{1/4})$ (however, our techniques work for $m=o(\sqrt{n})$). These findings are derived by analytical techniques ranging from multidimensional generating functions to the saddle point method.

Download Full-text

Joint String Complexity for Markov Sources

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.3001 ◽

2012 ◽

Vol DMTCS Proceedings vol. AQ,... (Proceedings) ◽

Author(s):

Philippe Jacquet ◽

Wojciech Szpankowski

Keyword(s):

Saddle Point ◽

Saddle Points ◽

Singularity Analysis ◽

Point Method ◽

Saddle Point Method ◽

Genome Sequences ◽

Markov Source ◽

International Audience ◽

The Common ◽

Markov Sources

International audience String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define $\textit{joint string complexity}$ as the set of words that are common to both strings. We also relax this definition and introduce $\textit{joint semi-complexity}$ restricted to the common words appearing at least twice in both strings. String complexity finds a number of applications from capturing the richness of a language to finding similarities between two genome sequences. In this paper we analyze joint complexity and joint semi-complexity when both strings are generated by a Markov source. The problem turns out to be quite challenging requiring subtle singularity analysis and saddle point method over infinity many saddle points leading to novel oscillatory phenomena with single and double periodicities.

Download Full-text

Fundamental Limits of Lossless Data Compression with Side Information

IEEE Transactions on Information Theory ◽

10.1109/tit.2021.3062614 ◽

2021 ◽

pp. 1-1

Author(s):

Lampros Gavalakis ◽

Ioannis Kontoyiannis

Keyword(s):

Data Compression ◽

Side Information ◽

Fundamental Limits ◽

Lossless Data Compression

Download Full-text

FPGA-Based Lossless Data Compression using Huffman and LZ77 Algorithms

2007 Canadian Conference on Electrical and Computer Engineering ◽

10.1109/ccece.2007.315 ◽

2007 ◽

Cited By ~ 38

Author(s):

Suzanne Rigler ◽

William Bishop ◽

Andrew Kennings

Keyword(s):

Data Compression ◽

Lossless Data Compression

Download Full-text

Design and implementation of data logger using lossless data compression method for Internet of Things

2016 International Conference on Frontiers of Information Technology (FIT) ◽

10.1109/fit.2016.7857547 ◽

2016 ◽

Author(s):

Febrian Hadiatna ◽

Hilwadi Hindersah ◽

Desta Yolanda ◽

Muhammad Agus Triawan

Keyword(s):

Internet Of Things ◽

Data Compression ◽

Compression Method ◽

Data Logger ◽

Design And Implementation ◽

Lossless Data Compression

Download Full-text

Joint Source-Channel Coding in Dictionary Methods of Lossless Data Compression

International Journal of Electronics and Telecommunications ◽

10.2478/v10177-010-0046-8 ◽

2010 ◽

Vol 56 (4) ◽

pp. 351-355

Author(s):

Marcin Rodziewicz

Keyword(s):

Data Compression ◽

Channel Coding ◽

Data Stream ◽

Noisy Channels ◽

Error Resiliency ◽

Joint Source Channel Coding ◽

Lossless Data Compression ◽

Compressed Data ◽

Source Channel Coding

Joint Source-Channel Coding in Dictionary Methods of Lossless Data Compression Limitations on memory and resources of communications systems require powerful data compression methods. Decompression of compressed data stream is very sensitive to errors which arise during transmission over noisy channels, therefore error correction coding is also required. One of the solutions to this problem is the application of joint source and channel coding. This paper contains a description of methods of joint source-channel coding based on the popular data compression algorithms LZ'77 and LZSS. These methods are capable of introducing some error resiliency into compressed stream of data without degradation of the compression ratio. We analyze joint source and channel coding algorithms based on these compression methods and present their novel extensions. We also present some simulation results showing usefulness and achievable quality of the analyzed algorithms.

Download Full-text