scholarly journals Matchtigs: minimum plain text representation of kmer sets

2021 ◽  
Author(s):  
Sebastian Schmidt ◽  
Shahbaz Khan ◽  
Jarno Alanko ◽  
Alexandru I. Tomescu

Kmer-based methods are widely used in bioinformatics, which raises the question of what is the smallest practically usable representation (i.e. plain text) of a set of kmers. We propose a polynomial algorithm computing a minimum such representation (which was previously posed as a potentially NP-hard open problem), as well as an efficient near-minimum greedy heuristic. When compressing genomes of large model organisms, read sets thereof or bacterial pangenomes, with only a minor runtime increase, we decrease the size of the representation by up to 60% over unitigs and 27% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 91% over previous work. Finally we show that a small representation has advantages in downstream applications, as it speeds up queries on the popular kmer indexing tool Bifrost by 1.66x over unitigs and 1.29x over previous work.

2019 ◽  
Vol 29 (04) ◽  
pp. 289-299
Author(s):  
Alexander Pilz ◽  
Carlos Seara
Keyword(s):  
Np Hard ◽  

We consider quadrangulations of red and blue points in the plane where each face is convex and no edge connects two points of the same color. In particular, we show that the following problem is NP-hard: Given a finite set [Formula: see text] of points with each point either red or blue, does there exist a convex quadrangulation of [Formula: see text] in such a way that the predefined colors give a valid vertex 2-coloring of the quadrangulation? We consider this as a step towards solving the corresponding long-standing open problem on monochromatic point sets.


Antibiotics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 224 ◽  
Author(s):  
Luis G. Alves ◽  
João F. Portel ◽  
Sílvia A. Sousa ◽  
Olga Ferreira ◽  
Stephanie Almada ◽  
...  

A series of cyclam- and cyclen-derived salts are described in the present work; they were designed specifically to gain insights into their structure and antibacterial activity towards Staphylococcus aureus and Escherichia coli, used respectively, as Gram-positive and Gram-negative model organisms. The newly synthesized compounds are monosubstituted and trans-disubstituted tetraazamacrocycles that display benzyl, methylbenzyl, trifluoromethylbenzyl, or trifluoroethylbenzyl substituents appended on the nitrogen atoms of the macrocyclic ring. The results obtained show that the chemical nature, polarity, and substitution patterns of the benzyl groups, as well as the number of pendant arms, are critical parameters for the antibacterial activity of the cyclam-based salts. The most active compounds against both bacterial strains were the trans-disubstituted cyclam salts displaying CF3 groups in the para-position of the aromatic rings of the macrocyclic pendant arms. The analogous cyclen species presents a lower activity, revealing that the size of the macrocyclic backbone is an important requirement for the antibacterial activity of the tetraazamacrocycles. The nature of the anionic counterparts present on the salts was found to play a minor role in the antibacterial activity.


2021 ◽  
Author(s):  
Xilin Yu ◽  
Thien Le ◽  
Sarah A. Christensen ◽  
Erin K. Molloy ◽  
Tandy Warnow

Abstract One of the Grand Challenges in Science is the construction of the Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP -hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a ``supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP -hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS


Author(s):  
Anthony Mansfield

AbstractThe thickness of a graph is a measure of its nonplanarity and has applications in the theory of printed circuits. To determine the thickness of an arbitrary graph is a seemingly intractable problem. This is made precise in this paper where we answer an open problem of Garey and Johnson (2) by proving that it is NP-complete to decide whether a graph has thickness two.


Cells ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 334
Author(s):  
Jaume Forés-Martos ◽  
Anabel Forte ◽  
José García-Martínez ◽  
José E. Pérez-Ortín

The ultimate goal of gene expression regulation is on the protein level. However, because the amounts of mRNAs and proteins are controlled by their synthesis and degradation rates, the cellular amount of a given protein can be attained by following different strategies. By studying omics data for six expression variables (mRNA and protein amounts, plus their synthesis and decay rates), we previously demonstrated the existence of common expression strategies (CESs) for functionally related genes in the yeast Saccharomyces cerevisiae. Here we extend that study to two other eukaryotes: the yeast Schizosaccharomyces pombe and cultured human HeLa cells. We also use genomic data from the model prokaryote Escherichia coli as an external reference. We show that six-variable profiles (6VPs) can be constructed for every gene and that these 6VPs are similar for genes with similar functions in all the studied organisms. The differences in 6VPs between organisms can be used to establish their phylogenetic relationships. The analysis of the correlations among the six variables supports the hypothesis that most gene expression control occurs in actively growing organisms at the transcription rate level, and that translation plays a minor role. We propose that living organisms use CESs for the genes acting on the same physiological pathways, especially for those belonging to stable macromolecular complexes, but CESs have been modeled by evolution to adapt to the specific life circumstances of each organism.


2015 ◽  
Vol Vol. 17 no.2 (Discrete Algorithms) ◽  
Author(s):  
Gwenaël Joret ◽  
Adrian Vetta

International audience We consider the <i>rank reduction problem</i> for matroids: Given a matroid $M$ and an integer $k$, find a minimum size subset of elements of $M$ whose removal reduces the rank of $M$ by at least $k$. When $M$ is a graphical matroid this problem is the minimum $k$-cut problem, which admits a 2-approximation algorithm. In this paper we show that the rank reduction problem for transversal matroids is essentially at least as hard to approximate as the densest $k$-subgraph problem. We also prove that, while the problem is easily solvable in polynomial time for partition matroids, it is NP-hard when considering the intersection of two partition matroids. Our proof shows, in particular, that the maximum vertex cover problem is NP-hard on bipartite graphs, which answers an open problem of B.&nbsp;Simeone.


2013 ◽  
Vol Vol. 15 no. 1 (Graph Theory) ◽  
Author(s):  
Olga Glebova ◽  
Yury Metelsky ◽  
Pavel Skums

Graph Theory International audience A Krausz (k,m)-partition of a graph G is a decomposition of G into cliques, such that any vertex belongs to at most k cliques and any two cliques have at most m vertices in common. The m-Krausz dimension kdimm(G) of the graph G is the minimum number k such that G has a Krausz (k,m)-partition. In particular, 1-Krausz dimension or simply Krausz dimension kdim(G) is a well-known graph-theoretical parameter. In this paper we prove that the problem "kdim(G)≤3" is polynomially solvable for chordal graphs, thus partially solving the open problem of P. Hlineny and J. Kratochvil. We solve another open problem of P. Hlineny and J. Kratochvil by proving that the problem of finding Krausz dimension is NP-hard for split graphs and complements of bipartite graphs. We show that the problem of finding m-Krausz dimension is NP-hard for every m≥1, but the problem "kdimm(G)≤k" is is fixed-parameter tractable when parameterized by k and m for (∞,1)-polar graphs. Moreover, the class of (∞,1)-polar graphs with kdimm(G)≤k is characterized by a finite list of forbidden induced subgraphs for every k,m≥1.


2020 ◽  
Author(s):  
Jaume Forés-Martos ◽  
Anabel Forte ◽  
José García-Martínez ◽  
José E. Pérez-Ortín

AbstractThe ultimate goal of gene regulation should focus on the protein level. However, as mRNA is an obligate intermediary, and because the amounts of mRNAs and proteins are controlled by their synthesis and degradation rates, the cellular amount of a given protein can be attained following different strategies. By studying omics datasets for six expression variables (mRNA and protein amounts, plus their synthesis and decay rates), we previously demonstrated the existence of common expression strategies (CES) for functionally-related genes in the yeast Saccharomyces cerevisiae. Here we extend that study to two other eukaryotes: the distantly related yeast Schizosaccharomyces pombe and cultured human HeLa cells. We also use genomic datasets from the model prokaryote Escherichia coli as an external reference. We show that CES are also present in all the studied organisms and the differences in them between organisms can be used to establish their phylogenetic relationships. The phenogram based on 6VP has the expected topology for the phylogeny of these four organisms, but shows interesting branch length differences to DNA sequence-based trees.The analysis of the correlations among the six variables supports that most gene expression control occurs in actively growing organisms at the transcription rate level, and that translation plays a minor role in it. We propose that all living cells use CES for the genes acting on the same physiological pathways, especially for those belonging to stable macromolecular complexes, but CES have been modeled by evolution to adapt to the specific life circumstances of each organism. The obtained phenograms may reflect both evolutionary constraints in expression strategies, and lifestyle convergences.


2021 ◽  
Vol 12 ◽  
Author(s):  
Maksim Sysoev ◽  
Stefan W. Grötzinger ◽  
Dominik Renn ◽  
Jörg Eppinger ◽  
Magnus Rueping ◽  
...  

Extremophiles are remarkable organisms that thrive in the harshest environments on Earth, such as hydrothermal vents, hypersaline lakes and pools, alkaline soda lakes, deserts, cold oceans, and volcanic areas. These organisms have developed several strategies to overcome environmental stress and nutrient limitations. Thus, they are among the best model organisms to study adaptive mechanisms that lead to stress tolerance. Genetic and structural information derived from extremophiles and extremozymes can be used for bioengineering other nontolerant enzymes. Furthermore, extremophiles can be a valuable resource for novel biotechnological and biomedical products due to their biosynthetic properties. However, understanding life under extreme conditions is challenging due to the difficulties of in vitro cultivation and observation since &gt; 99% of organisms cannot be cultivated. Consequently, only a minor percentage of the potential extremophiles on Earth have been discovered and characterized. Herein, we present a review of culture-independent methods, sequence-based metagenomics (SBM), and single amplified genomes (SAGs) for studying enzymes from extremophiles, with a focus on prokaryotic (archaea and bacteria) microorganisms. Additionally, we provide a comprehensive list of extremozymes discovered via metagenomics and SAGs.


10.37236/5321 ◽  
2016 ◽  
Vol 23 (1) ◽  
Author(s):  
Daniel J. Harvey ◽  
David R. Wood

Mader first proved that high average degree forces a given graph as a minor. Often motivated by Hadwiger's Conjecture, much research has focused on the average degree required to force a complete graph as a minor. Subsequently, various authors have considered the average degree required to force an arbitrary graph $H$ as a minor. Here, we strengthen (under certain conditions) a recent result by Reed and Wood, giving better bounds on the average degree required to force an $H$-minor when $H$ is a sparse graph with many high degree vertices. This solves an open problem of Reed and Wood, and also generalises (to within a constant factor) known results when $H$ is an unbalanced complete bipartite graph.


Sign in / Sign up

Export Citation Format

Share Document