finite alphabet Latest Research Papers

An Efficient Coding Technique for Stochastic Processes

Entropy ◽

10.3390/e24010065 ◽

2021 ◽

Vol 24 (1) ◽

pp. 65

Author(s):

Jesús E. Garca ◽

Verónica A. González-López ◽

Gustavo H. Tasca ◽

Karina Y. Yaginuma

Keyword(s):

Dna Sequences ◽

Coding Theory ◽

Transition Probabilities ◽

Real Data ◽

Finite Alphabet ◽

Real Problem ◽

Huffman Code ◽

Efficient Coding ◽

Codeword Length ◽

Hand Modeling

In the framework of coding theory, under the assumption of a Markov process (Xt) on a finite alphabet A, the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s algorithm is optimal for the number of bits needed to encode the data. On the other hand, modeling (Xt) through a Partition Markov Model (PMM) promotes a reduction in the number of transition probabilities needed to define the model. This paper shows how the use of Huffman code with a PMM reduces the number of bits needed in this process. We prove the estimation of a PMM allows for estimating the entropy of (Xt), providing an estimator of the minimum expected codeword length per symbol. We show the efficiency of the new methodology on a simulation study and, through a real problem of compression of DNA sequences of SARS-CoV-2, obtaining in the real data at least a reduction of 10.4%.

Language Modeling with Reduced Densities

Compositionality ◽

10.32408/compositionality-3-4 ◽

2021 ◽

Vol 3 ◽

pp. 4

Author(s):

Tai-Danae Bradley ◽

Yiannis Vlassopoulos

Keyword(s):

Mathematical Structure ◽

Positive Semidefinite ◽

Fundamental Question ◽

Language Models ◽

Finite Alphabet ◽

Text Data ◽

Enriched Category ◽

Unstructured Text ◽

Statistical Language Models ◽

Categorical Structure

This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also---and quite crucially---because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment.

The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise

Вычислительные технологии ◽

10.25743/ict.2021.26.5.008 ◽

2021 ◽

pp. 95-105

Author(s):

Галина Николаевна Жукова ◽

Михаил Васильевич Ульянов

Keyword(s):

Real World ◽

Periodic Sequence ◽

Level Of Detail ◽

Finite Alphabet ◽

Real World Data ◽

Periodic Sequences ◽

World Data ◽

Wide Range ◽

Subject Areas

В статье рассмотрена задача восстановления символьных периодических последовательностей, искаженных шумами вставки, а также замены и удаления символов. Поскольку степень детализации символьного описания процесса определяется мощностью алфавита, представляет интерес исследование влияния степени детализации символьного описания на возможность восстановления полной информации об исходной периодической последовательности. Представлено экспериментальное исследование зависимости характеристик качества предложенного авторами метода восстановления периода от мощности алфавита. Для алфавитов разной мощности приводятся доля последовательностей с удовлетворительно восстановленным периодом и относительная погрешность определения длины периода. Качество восстановления оценивается отношением редакционного расстояния от восстановленной периодической последовательности до исходной строго периодической последовательности The relevance of this study is associated with the presence of a wide range of applied problems in real-world data processing and analysis. It is sensible to encode information using symbols from a finite alphabet in such problems. By varying the cardinality of the alphabet, in the description of the process, the symbolic representation provides a level of detail sufficient for real-world data analysis. However, for a number of subject areas in which it is possible to use symbolic coding of trajectories of the examined processes researchers face the presence of distortions, noise, and fragmentation of information. This occurs in bioinformatics, medicine, digital economy, time series forecasting and analysis of business processes. Periodic processes are widely represented in these subject areas. Without noise, these processes correspond to periodic symbolic sequences, i.e. words over a finite alphabet. A researcher often receives a sequence distorted by noises of various origins as the experimental data, instead of the expected periodic symbolic sequence. Under these conditions, when solving the problem of identifying the periodicity, which includes both the determination of a periodically repeating symbolic fragment and its length, hereinafter called the period, the problem requires reducing the effect of noise on the experimental results. The article deals with the problem of recovering periodic sequences, distorted by presence of noise along the replaced and deleted symbols. Since the level of detail in the description of the process depends on the cardinality of the alphabet, it is of interest to study the influence of the level of detail in the symbolic description on the possibility of recovering complete information about the initially periodic sequences. The article experimentally examines the dependence of the cardinality of the alphabet on the quality characteristics of the period recovery method proposed by the authors. For alphabets of different cardinalities, the proportion of sequences with a satisfactorily reconstructed period and the relative error in determining the length of the period are given. The quality of reconstruction of a periodically repeating fragment is estimated by the ratio of the editing distance from the reconstructed periodic sequence to the original sequence distorted by noise

Intermittent estimation for finite alphabet finitarily Markovian processes with exponential tails

Kybernetika ◽

10.14736/kyb-2021-4-0628 ◽

2021 ◽

pp. 628-646

Author(s):

Gusztáv Morvai ◽

Benjamin Weiss

Keyword(s):

Finite Alphabet ◽

Exponential Tails ◽

Markovian Processes

A Refutation of Finite-State Language Models through Zipf’s Law for Factual Knowledge

Entropy ◽

10.3390/e23091148 ◽

2021 ◽

Vol 23 (9) ◽

pp. 1148

Author(s):

Łukasz Dębowski

Keyword(s):

Computational Linguistics ◽

Language Models ◽

Finite Alphabet ◽

Factual Knowledge ◽

Statistical Language Modeling ◽

Finite State ◽

Data Processing Inequality ◽

Hidden States ◽

Semantic Properties ◽

Zipf Law

We present a hypothetical argument against finite-state processes in statistical language modeling that is based on semantics rather than syntax. In this theoretical model, we suppose that the semantic properties of texts in a natural language could be approximately captured by a recently introduced concept of a perigraphic process. Perigraphic processes are a class of stochastic processes that satisfy a Zipf-law accumulation of a subset of factual knowledge, which is time-independent, compressed, and effectively inferrable from the process. We show that the classes of finite-state processes and of perigraphic processes are disjoint, and we present a new simple example of perigraphic processes over a finite alphabet called Oracle processes. The disjointness result makes use of the Hilberg condition, i.e., the almost sure power-law growth of algorithmic mutual information. Using a strongly consistent estimator of the number of hidden states, we show that finite-state processes do not satisfy the Hilberg condition whereas Oracle processes satisfy the Hilberg condition via the data-processing inequality. We discuss the relevance of these mathematical results for theoretical and computational linguistics.

Timed Trace Alignment with Metric Temporal Logic over Finite Traces

10.24963/kr.2021/22 ◽

2021 ◽

Author(s):

Giuseppe De Giacomo ◽

Aniello Murano ◽

Fabio Patrizi ◽

Giuseppe Perelli

Keyword(s):

Temporal Logic ◽

Process Mining ◽

Linear Time ◽

Timed Automata ◽

Finite Alphabet ◽

Minimal Set ◽

Metric Temporal Logic ◽

Finite State ◽

Linear Time Temporal Logic ◽

Infinite State

Trace Alignment is a prominent problem in Declarative Process Mining, which consists in identifying a minimal set of modifications that a log trace (produced by a system under execution) requires in order to be made compliant with a temporal specification. In its simplest form, log traces are sequences of events from a finite alphabet and specifications are written in DECLARE, a strict sublanguage of linear-time temporal logic over finite traces (LTLf ). The best approach for trace alignment has been developed in AI, using cost-optimal planning, and handles the whole LTLf . In this paper, we study the timed version of trace alignment, where events are paired with timestamps and specifications are provided in metric temporal logic over finite traces (MTLf ), essentially a superlanguage of LTLf . Due to the infiniteness of timestamps, this variant is substantially more challenging than the basic version, as the structures involved in the search are (uncountably) infinite-state, and calls for a more sophisticated machinery based on alternating (timed) automata, as opposed to the standard finite-state automata sufficient for the untimed version. The main contribution of the paper is a provably correct, effective technique for Timed Trace Alignment that takes advantage of results on MTLf decidability as well as on reachability for well-structured transition systems.

On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed Elements

Entropy ◽

10.3390/e23081045 ◽

2021 ◽

Vol 23 (8) ◽

pp. 1045

Author(s):

Farzad Shahrivari ◽

Nikola Zlatanov

Keyword(s):

Error Probability ◽

Supervised Classification ◽

Feature Vector ◽

Training Data ◽

Finite Alphabet ◽

Feature Vectors ◽

Optimal Classifier ◽

Mutually Independent ◽

Distributed Elements

In this paper, we investigate the problem of classifying feature vectors with mutually independent but non-identically distributed elements that take values from a finite alphabet set. First, we show the importance of this problem. Next, we propose a classifier and derive an analytical upper bound on its error probability. We show that the error probability moves to zero as the length of the feature vectors grows, even when there is only one training feature vector per label available. Thereby, we show that for this important problem at least one asymptotically optimal classifier exists. Finally, we provide numerical examples where we show that the performance of the proposed classifier outperforms conventional classification algorithms when the number of training data is small and the length of the feature vectors is sufficiently high.

Packing dimensions of basins generated by distributions on a finite alphabet

Journal of the Belarusian State University. Mathematics and Informatics ◽

10.33581/2520-6508-2021-2-6-16 ◽

2021 ◽

pp. 6-16

Author(s):

Victor I. Bakhtin ◽

Bruno Sadok

Keyword(s):

Packing Dimension ◽

Finite Alphabet ◽

Limit Set ◽

Empirical Measures ◽

Limit Sets ◽

Packing Dimensions ◽

Limit Behaviour

We consider a space of infinite signals composed of letters from a finite alphabet. Each signal generates a sequence of empirical measures on the alphabet and the limit set corresponding to this sequence. The space of signals is partitioned into narrow basins consisting of signals with identical limit sets for the sequence of empirical measures and for each narrow basin its packing dimension is computed. Furthermore, we compute packing dimensions for two other types of basins defined in terms of limit behaviour of the empirical measures.

Secure Communication for Multi-user Massive MIMO System With Finite Alphabet Inputs

10.1109/iccc52777.2021.9580294 ◽

2021 ◽

Author(s):

Xianyu Zhang ◽

Tao Liang ◽

Kang An ◽

Yifu Sun

Keyword(s):

Secure Communication ◽

Massive Mimo ◽

Mimo System ◽

Finite Alphabet

Equalization of Finite-Alphabet MMSE for All-Digital Massive MU-MIMO mm-Wave Communication

Journal of Physics Conference Series ◽

10.1088/1742-6596/1964/6/062049 ◽

2021 ◽

Vol 1964 (6) ◽

pp. 062049

Author(s):

B Paulchamy ◽

S Chidambaram ◽

S Vairaprakash ◽

A N Duraivel

Keyword(s):

Finite Alphabet

finite alphabet
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

An Efficient Coding Technique for Stochastic Processes

Language Modeling with Reduced Densities

The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise

Intermittent estimation for finite alphabet finitarily Markovian processes with exponential tails

A Refutation of Finite-State Language Models through Zipf’s Law for Factual Knowledge

Timed Trace Alignment with Metric Temporal Logic over Finite Traces

On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed Elements

Packing dimensions of basins generated by distributions on a finite alphabet

Secure Communication for Multi-user Massive MIMO System With Finite Alphabet Inputs

Equalization of Finite-Alphabet MMSE for All-Digital Massive MU-MIMO mm-Wave Communication

Export Citation Format

finite alphabetRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

An Efficient Coding Technique for Stochastic Processes

Language Modeling with Reduced Densities

The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise

Intermittent estimation for finite alphabet finitarily Markovian processes with exponential tails

A Refutation of Finite-State Language Models through Zipf’s Law for Factual Knowledge

Timed Trace Alignment with Metric Temporal Logic over Finite Traces

On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed Elements

Packing dimensions of basins generated by distributions on a finite alphabet

Secure Communication for Multi-user Massive MIMO System With Finite Alphabet Inputs

Equalization of Finite-Alphabet MMSE for All-Digital Massive MU-MIMO mm-Wave Communication

finite alphabet
Recently Published Documents