letter alphabet Latest Research Papers

2021 ◽

Vol 3 (4) ◽

Author(s):

Guilherme de Sena Brandine ◽

Andrew D Smith

Keyword(s):

Cytosine Methylation ◽

Bisulfite Sequencing ◽

Software Tool ◽

Read Mapping ◽

Mapping Algorithm ◽

Letter Alphabet ◽

Mapping Software ◽

Wide Range ◽

Range Of Functions ◽

Similar Accuracy

Abstract DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings.

Download Full-text

The Equation of the Set of Natural Numbers Just to Sum

International Journal of Research and Review ◽

10.52403/ijrr.20210547 ◽

2021 ◽

Vol 8 (5) ◽

pp. 379-388

Author(s):

Tulus Nadapdap ◽

Tulus . ◽

Opim Salim

Keyword(s):

Natural Number ◽

Unique Solution ◽

Systems Of Equations ◽

Natural Numbers ◽

Similar Construction ◽

Language Equations ◽

Letter Alphabet ◽

Periodic Constant ◽

Recursive Set

Systems of equations of the form X = Y + Z and X = C, in which the unknowns are sets of integers,”+” denotes pairwise sum of sets S + T = m + n m S, n T , and C is an ultimately periodic constant. When restricted to sets of natural numbers, such equations can be equally seen as language equations over a one-letter alphabet with concatenation and regular constants, and it is shown that such systems are computationally universal, in the sense that for every recursive set S N there exists a system with a unique solution containing T with S = n 16n + 13 T. For systems over sets of all integers, both positive and negative, there is a similar construction of a system with a unique solution S = {n|16n ∈ T} representing any hyper-arithmetical set S ⊆ N. Keywords: Language equations, Natural numbers, Equations of natural number.

Download Full-text

On Special k-Spectra, k-Locality, and Collapsing Prefix Normal Words

10.21941/kcss/2021/3 ◽

2021 ◽

Author(s):

Pamela Fleischmann

Keyword(s):

Complete Information ◽

Optimal Time ◽

Reconstruction Algorithms ◽

Matching Problem ◽

Open Problems ◽

Letter Alphabet ◽

Binary Word ◽

Np Complete ◽

Critical Words ◽

Time Bounds

The domain of Combinatorics on Words, first introduced by Axel Thue in 1906, covers by now many subdomains. In this work we are investigating scattered factors as a representation of non-complete information and two measurements for words, namely the locality of a word and prefix normality, which have applications in pattern matching. In the first part of the thesis we investigate scattered factors: A word u is a scattered factor of w if u can be obtained from w by deleting some of its letters. That is, there exist the (potentially empty) words u1, u2, . . . , un, and v0,v1,...,vn such that u = u1u2 ̈ ̈ ̈un and w = v0u1v1u2v2 ̈ ̈ ̈unvn. First, we consider the set of length-k scattered factors of a given word w, called the k-spectrum of w and denoted by ScatFactk(w). We prove a series of properties of the sets ScatFactk(w) for binary weakly-0-balanced and, respectively, weakly-c-balanced words w, i.e., words over a two- letter alphabet where the number of occurrences of each letter is the same, or, respectively, one letter has c occurrences more than the other. In particular, we consider the question which cardinalities n = | ScatFactk (w)| are obtainable, for a positive integer k, when w is either a weakly-0- balanced binary word of length 2k, or a weakly-c-balanced binary word of length 2k ́ c. Second, we investigate k-spectra that contain all possible words of length k, i.e., k-spectra of so called k-universal words. We present an algorithm deciding whether the k-spectra for given k of two words are equal or not, running in optimal time. Moreover, we present several results regarding k-universal words and extend this notion to circular universality that helps in investigating how the universality of repetitions of a given word can be determined. We conclude the part about scattered factors with results on the reconstruction problem of words from scattered factors that asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word w P {a, b} ̊ can be reconstructed from the number of occurrences of at most min(|w|a, |w|b) + 1 scattered factors of the form aib, where |w|a is the number of occurrences of the letter a in w. Moreover, we generalise the result to alphabets of the form {1, . . . , q} by showing that at most ∑q ́1 |w|i (q ́ i + 1) scattered factors suffices to reconstruct w. Both results i=1 improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here. In the second part we consider patterns, i.e., words consisting of not only letters but also variables, and in particular their locality. A pattern is called k-local if on marking the pattern in a given order never more than k marked blocks occur. We start with the proof that determining the minimal k for a given pattern such that the pattern is k-local is NP- complete. Afterwards we present results on the behaviour of the locality of repetitions and palindromes. We end this part with the proof that the matching problem becomes also NP-hard if we do not consider a regular pattern - for which the matching problem is efficiently solvable - but repetitions of regular patterns. In the last part we investigate prefix normal words which are binary words in which each prefix has at least the same number of 1s as any factor of the same length. First introduced in 2011 by Fici and Lipták, the problem of determining the index (amount of equivalence classes for a given word length) of the prefix normal equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending the notion of critical words). We prove characterizations for both the palindromes and the collapsing words and show their connection. Based on this, we show that still open problems regarding prefix normal words can be split into certain subproblems.

Download Full-text

Increased accuracy and speed in whole genome bisulfite read mapping using a two-letter alphabet

10.1101/2020.12.21.423849 ◽

2020 ◽

Author(s):

Guilherme de Sena Brandine ◽

Andrew D. Smith

Keyword(s):

Dna Sequence ◽

Cytosine Methylation ◽

Software Tool ◽

Whole Genome ◽

Mapping Algorithm ◽

Letter Alphabet ◽

Wide Range ◽

Genome Bisulfite Sequencing ◽

Range Of Functions ◽

Using Data

AbstractDNA methylation, characterized by the presence of methyl group at cytosines in a DNA sequence, is an important epigenomic mark with a wide range of functions across diverse organisms. Whole genome bisulfite sequencing (WGBS) has emerged as the gold standard to interrogate cytosine methylation. Accurately mapping WGBS reads to a reference genome allows reconstruction of tissue methylomes at single-base resolution. Algorithms used to map WGBS reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter.We introduce another bisulfite mapping algorithm (abismal), based on the novel idea of encoding a four-letter DNA sequence as two letters, one for purines and one for pyrimidines. We show theoretically that this encoding benefits from higher uniformity and specificity when subsequences are selected from reads for filtration. In our implementation, this leads to a decreased mapping time relative to the three-letter encoding. We demonstrate, using data from multiple public studies, that the abismal software tool improves mapping accuracy at significantly lower mapping times compared to commonly used mappers, with most notable improvements observed in samples originating from the random priming post-bisulfite adapter tagging protocol.

Download Full-text

Synchronizing Almost-Group Automata

International Journal of Foundations of Computer Science ◽

10.1142/s0129054120420058 ◽

2020 ◽

pp. 1-22

Author(s):

Mikhail V. Berlinkov ◽

Cyril Nicaud

Keyword(s):

Lower Bound ◽

Efficient Algorithm ◽

High Probability ◽

Worst Case ◽

Average Case ◽

Model Of Computation ◽

Letter Alphabet ◽

Strongly Connected ◽

Small Change ◽

Random Automata

In this paper we address the question of synchronizing random automata in the critical settings of almost-group automata. Group automata are automata where all letters act as permutations on the set of states, and they are not synchronizing (unless they have one state). In almost-group automata, one of the letters acts as a permutation on [Formula: see text] states, and the others as permutations. We prove that this small change is enough for automata to become synchronizing with high probability. More precisely, we establish that the probability that a strongly-connected almost-group automaton is not synchronizing is [Formula: see text], for a [Formula: see text]-letter alphabet. We also present an efficient algorithm that decides whether a strongly-connected almost-group automaton is synchronizing. For a natural model of computation, we establish a [Formula: see text] worst-case lower bound for this problem ([Formula: see text] for the average case), which is almost matched by our algorithm.

Download Full-text

Partial Information Decomposition and the Information Delta: A Geometric Unification Disentangling Non-Pairwise Information

Entropy ◽

10.3390/e22121333 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1333

Author(s):

James Kunert-Graf ◽

Nikita Sakhanenko ◽

David Galas

Keyword(s):

Information Theory ◽

Linkage Disequilibrium ◽

Partial Information ◽

Geometric Interpretation ◽

The Other ◽

Letter Alphabet ◽

Open Questions ◽

Robust Measures ◽

Geometric Picture ◽

Discrete Functions

Information theory provides robust measures of multivariable interdependence, but classically does little to characterize the multivariable relationships it detects. The Partial Information Decomposition (PID) characterizes the mutual information between variables by decomposing it into unique, redundant, and synergistic components. This has been usefully applied, particularly in neuroscience, but there is currently no generally accepted method for its computation. Independently, the Information Delta framework characterizes non-pairwise dependencies in genetic datasets. This framework has developed an intuitive geometric interpretation for how discrete functions encode information, but lacks some important generalizations. This paper shows that the PID and Delta frameworks are largely equivalent. We equate their key expressions, allowing for results in one framework to apply towards open questions in the other. For example, we find that the approach of Bertschinger et al. is useful for the open Information Delta question of how to deal with linkage disequilibrium. We also show how PID solutions can be mapped onto the space of delta measures. Using Bertschinger et al. as an example solution, we identify a specific plane in delta-space on which this approach’s optimization is constrained, and compute it for all possible three-variable discrete functions of a three-letter alphabet. This yields a clear geometric picture of how a given solution decomposes information.

Download Full-text

Impedance Matching and the Choice Between Alternative Pathways for the Origin of Genetic Coding

International Journal of Molecular Sciences ◽

10.3390/ijms21197392 ◽

2020 ◽

Vol 21 (19) ◽

pp. 7392

Author(s):

Peter R. Wills ◽

Charles W. Carter

Keyword(s):

Information Transfer ◽

Impedance Matching ◽

Electronic Energy ◽

Alphabet Size ◽

Alternative Pathways ◽

Gene Replication ◽

Letter Alphabet ◽

Energy Flows ◽

Physical Energy ◽

Chemical Free Energy

We recently observed that errors in gene replication and translation could be seen qualitatively to behave analogously to the impedances in acoustical and electronic energy transducing systems. We develop here quantitative relationships necessary to confirm that analogy and to place it into the context of the minimization of dissipative losses of both chemical free energy and information. The formal developments include expressions for the information transferred from a template to a new polymer, Iσ; an impedance parameter, Z; and an effective alphabet size, neff; all of which have non-linear dependences on the fidelity parameter, q, and the alphabet size, n. Surfaces of these functions over the {n,q} plane reveal key new insights into the origin of coding. Our conclusion is that the emergence and evolutionary refinement of information transfer in biology follow principles previously identified to govern physical energy flows, strengthening analogies (i) between chemical self-organization and biological natural selection, and (ii) between the course of evolutionary trajectories and the most probable pathways for time-dependent transitions in physics. Matching the informational impedance of translation to the four-letter alphabet of genes uncovers a pivotal role for the redundancy of triplet codons in preserving as much intrinsic genetic information as possible, especially in early stages when the coding alphabet size was small.

Download Full-text

Partial Information Decomposition and the Information Delta: A Geometric Unification Disentangling Non-Pairwise Information

10.20944/preprints202009.0661.v1 ◽

2020 ◽

Author(s):

James Kunert-Graf ◽

Nikita Sakhanenko ◽

David Galas

Keyword(s):

Information Theory ◽

Linkage Disequilibrium ◽

Partial Information ◽

Geometric Interpretation ◽

The Other ◽

Letter Alphabet ◽

Open Questions ◽

Robust Measures ◽

Geometric Picture ◽

Discrete Functions

Information theory provides robust measures of multivariable interdependence, but classically does little to characterize the multivariable relationships it detects. The Partial Information Decomposition (PID) characterizes the mutual information between variables by decomposing it into unique, redundant, and synergistic components. This has been usefully applied, particularly in neuroscience, but there is currently no generally accepted method for its computation. Independently, the Information Delta framework characterizes non-pairwise dependencies in genetic datasets. This framework has developed an intuitive geometric interpretation for how discrete functions encode information, but lacks some important generalizations. This paper shows that the PID and Delta frameworks are largely equivalent. We equate their key expressions, allowing for results in one framework to apply towards open questions in the other. For example, we find that the approach of Bertschinger et al. is useful for the open Information Delta question of how to deal with linkage disequilibrium. We also show how PID solutions can be mapped onto the space of delta measures. Using Bertschinger et al. as an example solution, we identify a specific plane in delta-space on which this approach’s optimization is constrained, and compute it for all possible three-variable discrete functions of a three-letter alphabet. This yields a clear geometric picture of how a given solution decomposes information

Download Full-text

Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words

Mathematics ◽

10.3390/math8050778 ◽

2020 ◽

Vol 8 (5) ◽

pp. 778

Author(s):

Herman Z. Q. Chen ◽

Sergey Kitaev ◽

Brian Y. Sun

Keyword(s):

Lower Bounds ◽

Exact Enumeration ◽

Letter Alphabet ◽

De Bruijn Sequences ◽

De Bruijn

A universal cycle, or u-cycle, for a given set of words is a circular word that contains each word from the set exactly once as a contiguous subword. The celebrated de Bruijn sequences are a particular case of such a u-cycle, where a set in question is the set A n of all words of length n over a k-letter alphabet A. A universal word, or u-word, is a linear, i.e., non-circular, version of the notion of a u-cycle, and it is defined similarly. Removing some words in A n may, or may not, result in a set of words for which u-cycle, or u-word, exists. The goal of this paper is to study the probability of existence of the universal objects in such a situation. We give lower bounds for the probability in general cases, and also derive explicit answers for the case of removing up to two words in A n , or the case when k = 2 and n ≤ 4 .

Download Full-text

From the four-color theorem to a generalizing “four-letter theorem”: A sketch for “human proof” and the philosophical interpretation

10.31235/osf.io/yf5x7 ◽

2020 ◽

Author(s):

Vasil Dinev Penchev

Keyword(s):

Mathematical Proof ◽

General Theorem ◽

Human Capabilities ◽

Letter Alphabet ◽

Philosophical Interpretation ◽

Four Color Theorem ◽

The Universe

The “four-color” theorem seems to be generalizable as follows. The four-letter alphabet is sufficient to encode unambiguously any set of well-orderings including a geographical map or the “map” of any logic and thus that of all logics or the DNA (RNA) plan(s) of any (all) alive being(s).Then the corresponding maximally generalizing conjecture would state: anything in the universe or mind can be encoded unambiguously by four letters.That admits to be formulated as a “four-letter theorem”, and thus one can search for a properly mathematical proof of the statement.It would imply the “four colour theorem”, the proof of which many philosophers and mathematicians believe not to be entirely satisfactory for it is not a “human proof”, but intermediated by computers unavoidably since the necessary calculations exceed the human capabilities fundamentally. It is furthermore rather unsatisfactory because it consists in enumerating and proving all cases one by one.Sometimes, a more general theorem turns out to be much easier for proving including a general “human” method, and the particular and too difficult for proving theorem to be implied as a corollary in certain simple conditions.The same approach will be followed as to the four colour theorem, i.e. to be deduced more or less trivially from the “four-letter theorem” if the latter is proved. References are only classical and thus very well-known papers: their complete bibliographic description is omitted.

Download Full-text

letter alphabet
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet

The Equation of the Set of Natural Numbers Just to Sum

On Special k-Spectra, k-Locality, and Collapsing Prefix Normal Words

Increased accuracy and speed in whole genome bisulfite read mapping using a two-letter alphabet

Synchronizing Almost-Group Automata

Partial Information Decomposition and the Information Delta: A Geometric Unification Disentangling Non-Pairwise Information

Impedance Matching and the Choice Between Alternative Pathways for the Origin of Genetic Coding

Partial Information Decomposition and the Information Delta: A Geometric Unification Disentangling Non-Pairwise Information

Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words

From the four-color theorem to a generalizing “four-letter theorem”: A sketch for “human proof” and the philosophical interpretation

Export Citation Format

letter alphabetRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet

The Equation of the Set of Natural Numbers Just to Sum

On Special k-Spectra, k-Locality, and Collapsing Prefix Normal Words

Increased accuracy and speed in whole genome bisulfite read mapping using a two-letter alphabet

Synchronizing Almost-Group Automata

Partial Information Decomposition and the Information Delta: A Geometric Unification Disentangling Non-Pairwise Information

Impedance Matching and the Choice Between Alternative Pathways for the Origin of Genetic Coding

Partial Information Decomposition and the Information Delta: A Geometric Unification Disentangling Non-Pairwise Information

Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words

From the four-color theorem to a generalizing “four-letter theorem”: A sketch for “human proof” and the philosophical interpretation

letter alphabet
Recently Published Documents