scholarly journals Asymptotically optimal minimizers schemes

2018 ◽  
Author(s):  
Guillaume Marçais ◽  
Dan DeBlasio ◽  
Carl Kingsford

AbstractMotivationThe minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density, and thereby making existing and future bioinformatics tools even more efficient.ResultsFrom the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the 3 type of [email protected]@cs.cmu.edu

1983 ◽  
Vol 15 (03) ◽  
pp. 507-530 ◽  
Author(s):  
G. Bordes ◽  
B. Roehner

We are interested in obtaining bounds for the spectrum of the infinite Jacobi matrix of a birth and death process or of any process (with nearest-neighbour interactions) defined by a similar Jacobi matrix. To this aim we use some results of Stieltjes theory for S-fractions, after reviewing them. We prove a general theorem giving a lower bound of the spectrum. The theorem also gives sufficient conditions for the spectrum to be discrete. The expression for the lower bound is then worked out explicitly for several, fairly general, classes of birth and death processes. A conjecture about the asymptotic behavior of a special class of birth and death processes is presented.


2020 ◽  
Vol 156 (8) ◽  
pp. 1699-1717
Author(s):  
Li Lai ◽  
Pin Yu

AbstractWe prove that, for any small $\varepsilon > 0$, the number of irrationals among the following odd zeta values: $\zeta (3),\zeta (5),\zeta (7),\ldots ,\zeta (s)$ is at least $( c_0 - \varepsilon )({s^{1/2}}/{(\log s)^{1/2}})$, provided $s$ is a sufficiently large odd integer with respect to $\varepsilon$. The constant $c_0 = 1.192507\ldots$ can be expressed in closed form. Our work improves the lower bound $2^{(1-\varepsilon )({\log s}/{\log \log s})}$ of the previous work of Fischler, Sprang and Zudilin. We follow the same strategy of Fischler, Sprang and Zudilin. The main new ingredient is an asymptotically optimal design for the zeros of the auxiliary rational functions, which relates to the inverse totient problem.


Life ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 95
Author(s):  
Hee-Eun Lee ◽  
Jae-Won Huh ◽  
Heui-Soo Kim

Transposable element (TE) has the ability to insert into certain parts of the genome, and due to this event, it is possible for TEs to generate new factors and one of these factors are microRNAs (miRNA). miRNAs are non-coding RNAs made up of 19 to 24 nucleotides and numerous miRNAs are derived from TE. In this study, to support general knowledge on TE and miRNAs derived from TE, several bioinformatics tools and databases were used to analyze miRNAs derived from TE in two aspects: evolution and human disease. The distribution of TEs in diverse species presents that almost half of the genome is covered with TE in mammalians and less than a half in other vertebrates and invertebrates. Based on selected evolution-related miRNAs studies, a total of 51 miRNAs derived from TE were found and analyzed. For the human disease-related miRNAs, total of 34 miRNAs derived from TE were organized from the previous studies. In summary, abundant miRNAs derived from TE are found, however, the function of miRNAs derived from TE is not informed either. Therefore, this study provides theoretical understanding of miRNAs derived from TE by using various bioinformatics tools.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i119-i127
Author(s):  
Hongyu Zheng ◽  
Carl Kingsford ◽  
Guillaume Marçais

Abstract Motivation Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method’s popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is 1/w, provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density O(1/w). Results We derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs. Availability and implementation Reference implementation of the Miniception and the codes for analysis can be found at https://github.com/kingsford-group/miniception. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Vol 13 (05) ◽  
pp. 1301-1315 ◽  
Author(s):  
Charles Burnette ◽  
Eric Schmutz

If [Formula: see text] is a polynomial of degree [Formula: see text] in [Formula: see text], let [Formula: see text] be the number of cycles of length [Formula: see text] in the directed graph on [Formula: see text] with edges [Formula: see text] For random polynomials, the numbers [Formula: see text] have asymptotic behavior resembling that for the cycle lengths of random functions [Formula: see text] However random polynomials differ from random functions in important ways. For example, given the set of cyclic (periodic) points, it is not necessarily true that all permutations of those cyclic points are equally likely to occur as the restriction of [Formula: see text]. This, and the limitations of Lagrange interpolation, together complicate research on [Formula: see text] the ultimate period of [Formula: see text] under compositional iteration. We prove a lower bound for the average value of [Formula: see text]: if [Formula: see text], but [Formula: see text], then the expected value of [Formula: see text] is [Formula: see text] where the sum is over all [Formula: see text] polynomials of degree [Formula: see text] in [Formula: see text]. Similar results are proved for rational functions.


Author(s):  
Hongyu Zheng ◽  
Carl Kingsford ◽  
Guillaume Marçais

AbstractMotivationMinimizers are methods to sample k-mers from a sequence, with the guarantee that similar set of k-mers will be chosen on similar sequences. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method’s popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is 1/w, provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density O(1/w).ResultsWe derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs.AvailabilityReference implementation of the Miniception and the codes for analysis can be found at https://github.com/kingsford-group/[email protected]


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2074 ◽  
Author(s):  
Kenzo-Hugo Hillion ◽  
Ivan Kuzmin ◽  
Anton Khodak ◽  
Eric Rasche ◽  
Michael Crusoe ◽  
...  

Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata.


Author(s):  
Mikhail V. Berlinkov ◽  
Cyril Nicaud

In this paper we address the question of synchronizing random automata in the critical settings of almost-group automata. Group automata are automata where all letters act as permutations on the set of states, and they are not synchronizing (unless they have one state). In almost-group automata, one of the letters acts as a permutation on [Formula: see text] states, and the others as permutations. We prove that this small change is enough for automata to become synchronizing with high probability. More precisely, we establish that the probability that a strongly-connected almost-group automaton is not synchronizing is [Formula: see text], for a [Formula: see text]-letter alphabet. We also present an efficient algorithm that decides whether a strongly-connected almost-group automaton is synchronizing. For a natural model of computation, we establish a [Formula: see text] worst-case lower bound for this problem ([Formula: see text] for the average case), which is almost matched by our algorithm.


Sign in / Sign up

Export Citation Format

Share Document