set cover
Recently Published Documents


TOTAL DOCUMENTS

325
(FIVE YEARS 79)

H-INDEX

25
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Luiz Carlos Irber ◽  
Phillip T Brooks ◽  
Taylor E Reiter ◽  
N Tessa Pierce-Ward ◽  
Mahmudur Rahman Hera ◽  
...  

The identification of reference genomes and taxonomic labels from metagenome data underlies many microbiome studies. Here we describe two algorithms for compositional analysis of metagenome sequencing data. We first investigate the FracMinHash sketching technique, a derivative of modulo hash that supports Jaccard containment estimation between sets of different sizes. We implement FracMinHash in the sourmash software, evaluate its accuracy, and demonstrate large-scale containment searches of metagenomes using 700,000 microbial reference genomes. We next frame shotgun metagenome compositional analysis as the problem of finding a minimum collection of reference genomes that "cover" the known k-mers in a metagenome, a minimum set cover problem. We implement a greedy approximate solution using FracMinHash sketches, and evaluate its accuracy for taxonomic assignment using a CAMI community benchmark. Finally, we show that the minimum metagenome cover can be used to guide the selection of reference genomes for read mapping. sourmash is available as open source software under the BSD 3-Clause license at github.com/dib-lab/sourmash/.


2022 ◽  
pp. 3496-3528
Author(s):  
Timothy M. Chan ◽  
Qizheng He ◽  
Subhash Suri ◽  
Jie Xue

Author(s):  
Felix Happach ◽  
Lisa Hellerstein ◽  
Thomas Lidbetter

We consider a large family of problems in which an ordering (or, more precisely, a chain of subsets) of a finite set must be chosen to minimize some weighted sum of costs. This family includes variations of min sum set cover, several scheduling and search problems, and problems in Boolean function evaluation. We define a new problem, called the min sum ordering problem (MSOP), which generalizes all these problems using a cost and a weight function defined on subsets of a finite set. Assuming a polynomial time α-approximation algorithm for the problem of finding a subset whose ratio of weight to cost is maximal, we show that under very minimal assumptions, there is a polynomial time [Formula: see text]-approximation algorithm for MSOP. This approximation result generalizes a proof technique used for several distinct problems in the literature. We apply this to obtain a number of new approximation results. Summary of Contribution: This paper provides a general framework for min sum ordering problems. Within the realm of theoretical computer science, these problems include min sum set cover and its generalizations, as well as problems in Boolean function evaluation. On the operations research side, they include problems in search theory and scheduling. We present and analyze a very general algorithm for these problems, unifying several previous results on various min sum ordering problems and resulting in new constant factor guarantees for others.


2021 ◽  
Author(s):  
Pooja Chaturvedi ◽  
Ajai Kumar Daniel ◽  
Vipul Narayan

Abstract Mathematical programming techniques are widely used in the determination of optimal functional configuration of a wireless sensor network (WSN). But these techniques have usually high computational complexity and are often considered as Non Polynomial (NP) complete problems. Therefore, machine learning (ML) techniques can be utilized for the prediction of the WSN parameters with high accuracy and lesser computational complexity than the mathematical programming techniques. This paper focuses on developing the prediction model for determination of the node status to be included in the set cover based on the coverage probability and trust values of the nodes. The set covers are defined as the subset of nodes which are scheduled to monitor the region of interest with the desired coverage level. Several machine learning techniques have been used to determine the node activation status based on which the set covers are obtained. The results show that the random forest based prediction model yields the highest accuracy for the considered network setting.


2021 ◽  
Vol 408 ◽  
pp. 126358
Author(s):  
Yingli Ran ◽  
Ying Zhang ◽  
Zhao Zhang

2021 ◽  
Vol 300 ◽  
pp. 25-35
Author(s):  
Claudio Contardo ◽  
Alain Hertz

Author(s):  
Mustafa C. Camur ◽  
Thomas Sharkey ◽  
Chrysafis Vogiatzis

We consider the problem of identifying the induced star with the largest cardinality open neighborhood in a graph. This problem, also known as the star degree centrality (SDC) problem, is shown to be [Formula: see text]-complete. In this work, we first propose a new integer programming (IP) formulation, which has a smaller number of constraints and nonzero coefficients in them than the existing formulation in the literature. We present classes of networks in which the problem is solvable in polynomial time and offer a new proof of [Formula: see text]-completeness that shows the problem remains [Formula: see text]-complete for both bipartite and split graphs. In addition, we propose a decomposition framework that is suitable for both the existing and our formulations. We implement several acceleration techniques in this framework, motivated by techniques used in Benders decomposition. We test our approaches on networks generated based on the Barabási–Albert, Erdös–Rényi, and Watts–Strogatz models. Our decomposition approach outperforms solving the IP formulations in most of the instances in terms of both solution time and quality; this is especially true for larger and denser graphs. We then test the decomposition algorithm on large-scale protein–protein interaction networks, for which SDC is shown to be an important centrality metric. Summary of Contribution: In this study, we first introduce a new integer programming (NIP) formulation for the star degree centrality (SDC) problem in which the goal is to identify the induced star with the largest open neighborhood. We then show that, although the SDC can be efficiently solved in tree graphs, it remains [Formula: see text]-complete in both split and bipartite graphs via a reduction performed from the set cover problem. In addition, we implement a decomposition algorithm motivated by Benders decomposition together with several acceleration techniques to both the NIP formulation and the existing formulation in the literature. Our experimental results indicate that the decomposition implementation on the NIP is the best solution method in terms of both solution time and quality.


2021 ◽  
Vol vol. 23, no. 3 (Combinatorics) ◽  
Author(s):  
Nicolas Grelier ◽  
Saeed Gh. Ilchi ◽  
Tillmann Miltzow ◽  
Shakhar Smorodinsky

A family S of convex sets in the plane defines a hypergraph H = (S, E) as follows. Every subfamily S' of S defines a hyperedge of H if and only if there exists a halfspace h that fully contains S' , and no other set of S is fully contained in h. In this case, we say that h realizes S'. We say a set S is shattered, if all its subsets are realized. The VC-dimension of a hypergraph H is the size of the largest shattered set. We show that the VC-dimension for pairwise disjoint convex sets in the plane is bounded by 3, and this is tight. In contrast, we show the VC-dimension of convex sets in the plane (not necessarily disjoint) is unbounded. We provide a quadratic lower bound in the number of pairs of intersecting sets in a shattered family of convex sets in the plane. We also show that the VC-dimension is unbounded for pairwise disjoint convex sets in R^d , for d > 2. We focus on, possibly intersecting, segments in the plane and determine that the VC-dimension is always at most 5. And this is tight, as we construct a set of five segments that can be shattered. We give two exemplary applications. One for a geometric set cover problem and one for a range-query data structure problem, to motivate our findings.


Sign in / Sign up

Export Citation Format

Share Document