Matchtigs: minimum plain text representation of kmer sets

Kmer-based methods are widely used in bioinformatics, which raises the question of what is the smallest practically usable representation (i.e. plain text) of a set of kmers. We propose a polynomial algorithm computing a minimum such representation (which was previously posed as a potentially NP-hard open problem), as well as an efficient near-minimum greedy heuristic. When compressing genomes of large model organisms, read sets thereof or bacterial pangenomes, with only a minor runtime increase, we decrease the size of the representation by up to 60% over unitigs and 27% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 91% over previous work. Finally we show that a small representation has advantages in downstream applications, as it speeds up queries on the popular kmer indexing tool Bifrost by 1.66x over unitigs and 1.29x over previous work.

Download Full-text

Convex Quadrangulations of Bichromatic Point Sets

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195919500109 ◽

2019 ◽

Vol 29 (04) ◽

pp. 289-299

Author(s):

Alexander Pilz ◽

Carlos Seara

Keyword(s):

Open Problem ◽

Np Hard ◽

Point Sets ◽

Finite Set

We consider quadrangulations of red and blue points in the plane where each face is convex and no edge connects two points of the same color. In particular, we show that the following problem is NP-hard: Given a finite set [Formula: see text] of points with each point either red or blue, does there exist a convex quadrangulation of [Formula: see text] in such a way that the predefined colors give a valid vertex 2-coloring of the quadrangulation? We consider this as a step towards solving the corresponding long-standing open problem on monochromatic point sets.

Download Full-text

Investigations into the Structure/Antibacterial Activity Relationships of Cyclam and Cyclen Derivatives

Antibiotics ◽

10.3390/antibiotics8040224 ◽

2019 ◽

Vol 8 (4) ◽

pp. 224 ◽

Cited By ~ 1

Author(s):

Luis G. Alves ◽

João F. Portel ◽

Sílvia A. Sousa ◽

Olga Ferreira ◽

Stephanie Almada ◽

...

Keyword(s):

Antibacterial Activity ◽

Chemical Nature ◽

Critical Parameters ◽

Model Organisms ◽

Minor Role ◽

Bacterial Strains ◽

Aromatic Rings ◽

Gram Negative ◽

A Minor ◽

Pendant Arms

A series of cyclam- and cyclen-derived salts are described in the present work; they were designed specifically to gain insights into their structure and antibacterial activity towards Staphylococcus aureus and Escherichia coli, used respectively, as Gram-positive and Gram-negative model organisms. The newly synthesized compounds are monosubstituted and trans-disubstituted tetraazamacrocycles that display benzyl, methylbenzyl, trifluoromethylbenzyl, or trifluoroethylbenzyl substituents appended on the nitrogen atoms of the macrocyclic ring. The results obtained show that the chemical nature, polarity, and substitution patterns of the benzyl groups, as well as the number of pendant arms, are critical parameters for the antibacterial activity of the cyclam-based salts. The most active compounds against both bacterial strains were the trans-disubstituted cyclam salts displaying CF3 groups in the para-position of the aromatic rings of the macrocyclic pendant arms. The analogous cyclen species presents a lower activity, revealing that the size of the macrocyclic backbone is an important requirement for the antibacterial activity of the tetraazamacrocycles. The nature of the anionic counterparts present on the salts was found to play a minor role in the antibacterial activity.

Download Full-text

Using Robinson-Foulds Supertrees in Divide-and-Conquer Phylogeny Estimation

10.21203/rs.3.rs-174421/v1 ◽

2021 ◽

Author(s):

Xilin Yu ◽

Thien Le ◽

Sarah A. Christensen ◽

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Tree Of Life ◽

Divide And Conquer ◽

Greedy Heuristic ◽

Mcmc Methods ◽

Np Hard ◽

Phylogeny Estimation ◽

Source Form

Abstract One of the Grand Challenges in Science is the construction of the Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP -hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a ``supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP -hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS

Download Full-text

Determining the thickness of graphs is NP-hard

Mathematical Proceedings of the Cambridge Philosophical Society ◽

10.1017/s030500410006028x ◽

1983 ◽

Vol 93 (1) ◽

pp. 9-23 ◽

Cited By ~ 47

Author(s):

Anthony Mansfield

Keyword(s):

Open Problem ◽

Np Hard ◽

Arbitrary Graph ◽

Printed Circuits ◽

Np Complete ◽

Intractable Problem

AbstractThe thickness of a graph is a measure of its nonplanarity and has applications in the theory of printed circuits. To determine the thickness of an arbitrary graph is a seemingly intractable problem. This is made precise in this paper where we answer an open problem of Garey and Johnson (2) by proving that it is NP-complete to decide whether a graph has thickness two.

Download Full-text

A Trans-Omics Comparison Reveals Common Gene Expression Strategies in Four Model Organisms and Exposes Similarities and Differences between Them

Cells ◽

10.3390/cells10020334 ◽

2021 ◽

Vol 10 (2) ◽

pp. 334

Author(s):

Jaume Forés-Martos ◽

Anabel Forte ◽

José García-Martínez ◽

José E. Pérez-Ortín

Keyword(s):

Gene Expression ◽

Gene Expression Regulation ◽

Decay Rates ◽

Model Organisms ◽

Minor Role ◽

Yeast Saccharomyces Cerevisiae ◽

Degradation Rates ◽

Gene Expression Control ◽

Living Organisms ◽

A Minor

The ultimate goal of gene expression regulation is on the protein level. However, because the amounts of mRNAs and proteins are controlled by their synthesis and degradation rates, the cellular amount of a given protein can be attained by following different strategies. By studying omics data for six expression variables (mRNA and protein amounts, plus their synthesis and decay rates), we previously demonstrated the existence of common expression strategies (CESs) for functionally related genes in the yeast Saccharomyces cerevisiae. Here we extend that study to two other eukaryotes: the yeast Schizosaccharomyces pombe and cultured human HeLa cells. We also use genomic data from the model prokaryote Escherichia coli as an external reference. We show that six-variable profiles (6VPs) can be constructed for every gene and that these 6VPs are similar for genes with similar functions in all the studied organisms. The differences in 6VPs between organisms can be used to establish their phylogenetic relationships. The analysis of the correlations among the six variables supports the hypothesis that most gene expression control occurs in actively growing organisms at the transcription rate level, and that translation plays a minor role. We propose that living organisms use CESs for the genes acting on the same physiological pathways, especially for those belonging to stable macromolecular complexes, but CESs have been modeled by evolution to adapt to the specific life circumstances of each organism.

Download Full-text

Reducing the rank of a matroid

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2135 ◽

2015 ◽

Vol Vol. 17 no.2 (Discrete Algorithms) ◽

Author(s):

Gwenaël Joret ◽

Adrian Vetta

Keyword(s):

Approximation Algorithm ◽

Polynomial Time ◽

Open Problem ◽

Vertex Cover ◽

Minimum Size ◽

Np Hard ◽

Rank Reduction ◽

Reduction Problem ◽

International Audience ◽

Cover Problem

International audience We consider the <i>rank reduction problem</i> for matroids: Given a matroid $M$ and an integer $k$, find a minimum size subset of elements of $M$ whose removal reduces the rank of $M$ by at least $k$. When $M$ is a graphical matroid this problem is the minimum $k$-cut problem, which admits a 2-approximation algorithm. In this paper we show that the rank reduction problem for transversal matroids is essentially at least as hard to approximate as the densest $k$-subgraph problem. We also prove that, while the problem is easily solvable in polynomial time for partition matroids, it is NP-hard when considering the intersection of two partition matroids. Our proof shows, in particular, that the maximum vertex cover problem is NP-hard on bipartite graphs, which answers an open problem of B. Simeone.

Download Full-text

Krausz dimension and its generalizations in special graph classes

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.623 ◽

2013 ◽

Vol Vol. 15 no. 1 (Graph Theory) ◽

Author(s):

Olga Glebova ◽

Yury Metelsky ◽

Pavel Skums

Keyword(s):

Open Problem ◽

Chordal Graphs ◽

Np Hard ◽

Induced Subgraphs ◽

Fixed Parameter Tractable ◽

Graph Classes ◽

Fixed Parameter ◽

Minimum Number ◽

International Audience ◽

Split Graphs

Graph Theory International audience A Krausz (k,m)-partition of a graph G is a decomposition of G into cliques, such that any vertex belongs to at most k cliques and any two cliques have at most m vertices in common. The m-Krausz dimension kdimm(G) of the graph G is the minimum number k such that G has a Krausz (k,m)-partition. In particular, 1-Krausz dimension or simply Krausz dimension kdim(G) is a well-known graph-theoretical parameter. In this paper we prove that the problem "kdim(G)≤3" is polynomially solvable for chordal graphs, thus partially solving the open problem of P. Hlineny and J. Kratochvil. We solve another open problem of P. Hlineny and J. Kratochvil by proving that the problem of finding Krausz dimension is NP-hard for split graphs and complements of bipartite graphs. We show that the problem of finding m-Krausz dimension is NP-hard for every m≥1, but the problem "kdimm(G)≤k" is is fixed-parameter tractable when parameterized by k and m for (∞,1)-polar graphs. Moreover, the class of (∞,1)-polar graphs with kdimm(G)≤k is characterized by a finite list of forbidden induced subgraphs for every k,m≥1.

Download Full-text

A trans-omics comparison reveals common gene expression strategies in four model organisms and exposes similarities and differences between them

10.1101/2020.09.04.283143 ◽

2020 ◽

Author(s):

Jaume Forés-Martos ◽

Anabel Forte ◽

José García-Martínez ◽

José E. Pérez-Ortín

Keyword(s):

Gene Expression ◽

Branch Length ◽

Decay Rates ◽

Model Organisms ◽

Minor Role ◽

Expression Control ◽

External Reference ◽

Degradation Rates ◽

Gene Expression Control ◽

A Minor

AbstractThe ultimate goal of gene regulation should focus on the protein level. However, as mRNA is an obligate intermediary, and because the amounts of mRNAs and proteins are controlled by their synthesis and degradation rates, the cellular amount of a given protein can be attained following different strategies. By studying omics datasets for six expression variables (mRNA and protein amounts, plus their synthesis and decay rates), we previously demonstrated the existence of common expression strategies (CES) for functionally-related genes in the yeast Saccharomyces cerevisiae. Here we extend that study to two other eukaryotes: the distantly related yeast Schizosaccharomyces pombe and cultured human HeLa cells. We also use genomic datasets from the model prokaryote Escherichia coli as an external reference. We show that CES are also present in all the studied organisms and the differences in them between organisms can be used to establish their phylogenetic relationships. The phenogram based on 6VP has the expected topology for the phylogeny of these four organisms, but shows interesting branch length differences to DNA sequence-based trees.The analysis of the correlations among the six variables supports that most gene expression control occurs in actively growing organisms at the transcription rate level, and that translation plays a minor role in it. We propose that all living cells use CES for the genes acting on the same physiological pathways, especially for those belonging to stable macromolecular complexes, but CES have been modeled by evolution to adapt to the specific life circumstances of each organism. The obtained phenograms may reflect both evolutionary constraints in expression strategies, and lifestyle convergences.

Download Full-text

Bioprospecting of Novel Extremozymes From Prokaryotes—The Advent of Culture-Independent Methods

Frontiers in Microbiology ◽

10.3389/fmicb.2021.630013 ◽

2021 ◽

Vol 12 ◽

Author(s):

Maksim Sysoev ◽

Stefan W. Grötzinger ◽

Dominik Renn ◽

Jörg Eppinger ◽

Magnus Rueping ◽

...

Keyword(s):

Hydrothermal Vents ◽

Structural Information ◽

Soda Lakes ◽

Model Organisms ◽

Hypersaline Lakes ◽

Nutrient Limitations ◽

Adaptive Mechanisms ◽

Culture Independent ◽

A Minor

Extremophiles are remarkable organisms that thrive in the harshest environments on Earth, such as hydrothermal vents, hypersaline lakes and pools, alkaline soda lakes, deserts, cold oceans, and volcanic areas. These organisms have developed several strategies to overcome environmental stress and nutrient limitations. Thus, they are among the best model organisms to study adaptive mechanisms that lead to stress tolerance. Genetic and structural information derived from extremophiles and extremozymes can be used for bioengineering other nontolerant enzymes. Furthermore, extremophiles can be a valuable resource for novel biotechnological and biomedical products due to their biosynthetic properties. However, understanding life under extreme conditions is challenging due to the difficulties of in vitro cultivation and observation since > 99% of organisms cannot be cultivated. Consequently, only a minor percentage of the potential extremophiles on Earth have been discovered and characterized. Herein, we present a review of culture-independent methods, sequence-based metagenomics (SBM), and single amplified genomes (SAGs) for studying enzymes from extremophiles, with a focus on prokaryotic (archaea and bacteria) microorganisms. Additionally, we provide a comprehensive list of extremozymes discovered via metagenomics and SAGs.

Download Full-text

Average Degree Conditions Forcing a Minor

The Electronic Journal of Combinatorics ◽

10.37236/5321 ◽

2016 ◽

Vol 23 (1) ◽

Cited By ~ 4

Author(s):

Daniel J. Harvey ◽

David R. Wood

Keyword(s):