Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Author(s):  
Kai Han ◽  
Shuang Cui ◽  
Tianshuai Zhu ◽  
Enpei Zhang ◽  
Benwei Wu ◽  
...  

Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.

Author(s):  
Jing Tang ◽  
Xueyan Tang ◽  
Andrew Lim ◽  
Kai Han ◽  
Chongshou Li ◽  
...  

Monotone submodular maximization with a knapsack constraint is NP-hard. Various approximation algorithms have been devised to address this optimization problem. In this paper, we revisit the widely known modified greedy algorithm. First, we show that this algorithm can achieve an approximation factor of 0.405, which significantly improves the known factors of 0.357 given by Wolsey and (1-1/e)/2\approx 0.316 given by Khuller et al. More importantly, our analysis closes a gap in Khuller et al.'s proof for the extensively mentioned approximation factor of (1-1/\sqrte )\approx 0.393 in the literature to clarify a long-standing misconception on this issue. Second, we enhance the modified greedy algorithm to derive a data-dependent upper bound on the optimum. We empirically demonstrate the tightness of our upper bound with a real-world application. The bound enables us to obtain a data-dependent ratio typically much higher than 0.405 between the solution value of the modified greedy algorithm and the optimum. It can also be used to significantly improve the efficiency of algorithms such as branch and bound.


2001 ◽  
Vol 12 (04) ◽  
pp. 533-550 ◽  
Author(s):  
WING-KAI HON ◽  
TAK-WAH LAM

The nearest neighbor interchange (nni) distance is a classical metric for measuring the distance (dissimilarity) between evolutionary trees. It has been known that computing the nni distance is NP-complete. Existing approximation algorithms can attain an approximation ratio log n for unweighted trees and 4 log n for weighted trees; yet these algorithms are limited to degree-3 trees. This paper extends the study of nni distance to trees with non-uniform degrees. We formulate the necessary and sufficient conditions for nni transformation and devise more topology-sensitive approximation algorithms to handle trees with non-uniform degrees. The approximation ratios are respectively [Formula: see text] and [Formula: see text] for unweighted and weighted trees, where d ≥ 4 is the maximum degree of the input trees.


2021 ◽  
Vol 1 (1) ◽  
pp. 59-77
Author(s):  
Russell Lee ◽  
Jessica Maghakian ◽  
Mohammad Hajiesmaili ◽  
Jian Li ◽  
Ramesh Sitaraman ◽  
...  

This paper studies the online energy scheduling problem in a hybrid model where the cost of energy is proportional to both the volume and peak usage, and where energy can be either locally generated or drawn from the grid. Inspired by recent advances in online algorithms with Machine Learned (ML) advice, we develop parameterized deterministic and randomized algorithms for this problem such that the level of reliance on the advice can be adjusted by a trust parameter. We then analyze the performance of the proposed algorithms using two performance metrics: robustness that measures the competitive ratio as a function of the trust parameter when the advice is inaccurate, and consistency for competitive ratio when the advice is accurate. Since the competitive ratio is analyzed in two different regimes, we further investigate the Pareto optimality of the proposed algorithms. Our results show that the proposed deterministic algorithm is Pareto-optimal, in the sense that no other online deterministic algorithms can dominate the robustness and consistency of our algorithm. Furthermore, we show that the proposed randomized algorithm dominates the Pareto-optimal deterministic algorithm. Our large-scale empirical evaluations using real traces of energy demand, energy prices, and renewable energy generations highlight that the proposed algorithms outperform worst-case optimized algorithms and fully data-driven algorithms.


Author(s):  
Zhicheng Liu ◽  
Hong Chang ◽  
Ran Ma ◽  
Donglei Du ◽  
Xiaoyan Zhang

Abstract We consider a two-stage submodular maximization problem subject to a cardinality constraint and k matroid constraints, where the objective function is the expected difference of a nonnegative monotone submodular function and a nonnegative monotone modular function. We give two bi-factor approximation algorithms for this problem. The first is a deterministic $\left( {{1 \over {k + 1}}\left( {1 - {1 \over {{e^{k + 1}}}}} \right),1} \right)$ -approximation algorithm, and the second is a randomized $\left( {{1 \over {k + 1}}\left( {1 - {1 \over {{e^{k + 1}}}}} \right) - \varepsilon ,1} \right)$ -approximation algorithm with improved time efficiency.


Author(s):  
Ganquan Shi ◽  
Shuyang Gu ◽  
Weili Wu

[Formula: see text]-submodular maximization is a generalization of submodular maximization, which requires us to select [Formula: see text] disjoint subsets instead of one subset. Attracted by practical values and applications, we consider [Formula: see text]-submodular maximization with two kinds of constraints. For total size and individual size difference constraints, we present a [Formula: see text]-approximation algorithm for maximizing a nonnegative k-submodular function, running in time [Formula: see text] at worst. Specially, if [Formula: see text] is multiple of [Formula: see text], the approximation ratio can reduce to [Formula: see text], running in time [Formula: see text] at worst. Besides, this algorithm can be applied to [Formula: see text]-bisubmodular achieving [Formula: see text]-approximation running in time [Formula: see text]. Furthermore, if [Formula: see text] is multiple of 2, the approximation ratio can reduce to [Formula: see text], running in time [Formula: see text] at worst. For individual size constraint, there is a [Formula: see text]-approximation algorithm for maximizing a nonnegative [Formula: see text]-submodular function and an nonnegative [Formula: see text]-bisubmodular function, running in time [Formula: see text] and [Formula: see text] respectively, at worst.


2005 ◽  
Vol 1 (1) ◽  
pp. 11-14 ◽  
Author(s):  
Sanguthevar Rajasekaran

Given a weighted graph G(V;E), a minimum spanning tree for G can be obtained in linear time using a randomized algorithm or nearly linear time using a deterministic algorithm. Given n points in the plane, we can construct a graph with these points as nodes and an edge between every pair of nodes. The weight on any edge is the Euclidean distance between the two points. Finding a minimum spanning tree for this graph is known as the Euclidean minimum spanning tree problem (EMSTP). The minimum spanning tree algorithms alluded to before will run in time O(n2) (or nearly O(n2)) on this graph. In this note we point out that it is possible to devise simple algorithms for EMSTP in k- dimensions (for any constant k) whose expected run time is O(n), under the assumption that the points are uniformly distributed in the space of interest.CR Categories: F2.2 Nonnumerical Algorithms and Problems; G.3 Probabilistic Algorithms


2022 ◽  
Vol 50 (1) ◽  
pp. 28-31
Author(s):  
Zhongzheng Tang ◽  
Chenhao Wang ◽  
Hau Chan

2022 ◽  
Vol 18 (1) ◽  
pp. 1-16
Author(s):  
Alessandra Graf ◽  
David G. Harris ◽  
Penny Haxell

An independent transversal (IT) in a graph with a given vertex partition is an independent set consisting of one vertex in each partition class. Several sufficient conditions are known for the existence of an IT in a given graph and vertex partition, which have been used over the years to solve many combinatorial problems. Some of these IT existence theorems have algorithmic proofs, but there remains a gap between the best existential bounds and the bounds obtainable by efficient algorithms. Recently, Graf and Haxell (2018) described a new (deterministic) algorithm that asymptotically closes this gap, but there are limitations on its applicability. In this article, we develop a randomized algorithm that is much more widely applicable, and demonstrate its use by giving efficient algorithms for two problems concerning the strong chromatic number of graphs.


Sign in / Sign up

Export Citation Format

Share Document