Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.

Download Full-text

Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems ◽

10.1145/3410220.3453922 ◽

2021 ◽

Author(s):

Kai Han ◽

Shuang Cui ◽

Tianshuai Zhu ◽

Enpei Zhang ◽

Benwei Wu ◽

...

Keyword(s):

Approximation Algorithms ◽

Data Summarization ◽

Knapsack Constraint

Download Full-text

Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint

Proceedings of the ACM on Measurement and Analysis of Computing Systems ◽

10.1145/3447386 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-22

Author(s):

Jing Tang ◽

Xueyan Tang ◽

Andrew Lim ◽

Kai Han ◽

Chongshou Li ◽

...

Keyword(s):

Approximation Algorithms ◽

Greedy Algorithm ◽

Branch And Bound ◽

Upper Bound ◽

Optimization Problem ◽

Approximation Factor ◽

Real World Application ◽

Efficiency Of Algorithms ◽

Knapsack Constraint ◽

Submodular Maximization

Monotone submodular maximization with a knapsack constraint is NP-hard. Various approximation algorithms have been devised to address this optimization problem. In this paper, we revisit the widely known modified greedy algorithm. First, we show that this algorithm can achieve an approximation factor of 0.405, which significantly improves the known factors of 0.357 given by Wolsey and (1-1/e)/2\approx 0.316 given by Khuller et al. More importantly, our analysis closes a gap in Khuller et al.'s proof for the extensively mentioned approximation factor of (1-1/\sqrte )\approx 0.393 in the literature to clarify a long-standing misconception on this issue. Second, we enhance the modified greedy algorithm to derive a data-dependent upper bound on the optimum. We empirically demonstrate the tightness of our upper bound with a real-world application. The bound enables us to obtain a data-dependent ratio typically much higher than 0.405 between the solution value of the modified greedy algorithm and the optimum. It can also be used to significantly improve the efficiency of algorithms such as branch and bound.

Download Full-text

APPROXIMATING THE NEAREST NEIGHBOR INTERCHARGE DISTANCE FOR NON-UNIFORM-DEGREE EVOLUTIONARY TREES

International Journal of Foundations of Computer Science ◽

10.1142/s0129054101000631 ◽

2001 ◽

Vol 12 (04) ◽

pp. 533-550 ◽

Cited By ~ 4

Author(s):

WING-KAI HON ◽

TAK-WAH LAM

Keyword(s):

Approximation Algorithms ◽

Maximum Degree ◽

Nearest Neighbor ◽

Sufficient Conditions ◽

Necessary And Sufficient Conditions ◽

Approximation Ratio ◽

Evolutionary Trees ◽

Weighted Trees ◽

Np Complete ◽

Necessary And Sufficient

The nearest neighbor interchange (nni) distance is a classical metric for measuring the distance (dissimilarity) between evolutionary trees. It has been known that computing the nni distance is NP-complete. Existing approximation algorithms can attain an approximation ratio log n for unweighted trees and 4 log n for weighted trees; yet these algorithms are limited to degree-3 trees. This paper extends the study of nni distance to trees with non-uniform degrees. We formulate the necessary and sufficient conditions for nni transformation and devise more topology-sensitive approximation algorithms to handle trees with non-uniform degrees. The approximation ratios are respectively [Formula: see text] and [Formula: see text] for unweighted and weighted trees, where d ≥ 4 is the maximum degree of the input trees.

Download Full-text

Online peak-aware energy scheduling with untrusted advice

ACM SIGEnergy Energy Informatics Review ◽

10.1145/3508467.3508473 ◽

2021 ◽

Vol 1 (1) ◽

pp. 59-77

Author(s):

Russell Lee ◽

Jessica Maghakian ◽

Mohammad Hajiesmaili ◽

Jian Li ◽

Ramesh Sitaraman ◽

...

Keyword(s):

Competitive Ratio ◽

Energy Demand ◽

Large Scale ◽

Performance Metrics ◽

Randomized Algorithm ◽

Deterministic Algorithm ◽

Pareto Optimal ◽

Energy Prices ◽

Worst Case ◽

Cost Of Energy

This paper studies the online energy scheduling problem in a hybrid model where the cost of energy is proportional to both the volume and peak usage, and where energy can be either locally generated or drawn from the grid. Inspired by recent advances in online algorithms with Machine Learned (ML) advice, we develop parameterized deterministic and randomized algorithms for this problem such that the level of reliance on the advice can be adjusted by a trust parameter. We then analyze the performance of the proposed algorithms using two performance metrics: robustness that measures the competitive ratio as a function of the trust parameter when the advice is inaccurate, and consistency for competitive ratio when the advice is accurate. Since the competitive ratio is analyzed in two different regimes, we further investigate the Pareto optimality of the proposed algorithms. Our results show that the proposed deterministic algorithm is Pareto-optimal, in the sense that no other online deterministic algorithms can dominate the robustness and consistency of our algorithm. Furthermore, we show that the proposed randomized algorithm dominates the Pareto-optimal deterministic algorithm. Our large-scale empirical evaluations using real traces of energy demand, energy prices, and renewable energy generations highlight that the proposed algorithms outperform worst-case optimized algorithms and fully data-driven algorithms.

Download Full-text

Two-stage submodular maximization problem beyond nonnegative and monotone

Mathematical Structures in Computer Science ◽

10.1017/s0960129521000372 ◽

2021 ◽

pp. 1-16

Author(s):

Zhicheng Liu ◽

Hong Chang ◽

Ran Ma ◽

Donglei Du ◽

Xiaoyan Zhang

Keyword(s):

Approximation Algorithms ◽

Approximation Algorithm ◽

Objective Function ◽

Modular Function ◽

Submodular Function ◽

Maximization Problem ◽

Cardinality Constraint ◽

Two Stage ◽

Time Efficiency ◽

Submodular Maximization

Abstract We consider a two-stage submodular maximization problem subject to a cardinality constraint and k matroid constraints, where the objective function is the expected difference of a nonnegative monotone submodular function and a nonnegative monotone modular function. We give two bi-factor approximation algorithms for this problem. The first is a deterministic $\left( {{1 \over {k + 1}}\left( {1 - {1 \over {{e^{k + 1}}}}} \right),1} \right)$ -approximation algorithm, and the second is a randomized $\left( {{1 \over {k + 1}}\left( {1 - {1 \over {{e^{k + 1}}}}} \right) - \varepsilon ,1} \right)$ -approximation algorithm with improved time efficiency.

Download Full-text

k-Submodular maximization with two kinds of constraints

Discrete Mathematics Algorithms and Applications ◽

10.1142/s1793830921500361 ◽

2020 ◽

pp. 2150036

Author(s):

Ganquan Shi ◽

Shuyang Gu ◽

Weili Wu

Keyword(s):

Approximation Algorithm ◽

Submodular Function ◽

Approximation Ratio ◽

Size Difference ◽

Total Size ◽

Difference Constraints ◽

Submodular Maximization ◽

Individual Size

[Formula: see text]-submodular maximization is a generalization of submodular maximization, which requires us to select [Formula: see text] disjoint subsets instead of one subset. Attracted by practical values and applications, we consider [Formula: see text]-submodular maximization with two kinds of constraints. For total size and individual size difference constraints, we present a [Formula: see text]-approximation algorithm for maximizing a nonnegative k-submodular function, running in time [Formula: see text] at worst. Specially, if [Formula: see text] is multiple of [Formula: see text], the approximation ratio can reduce to [Formula: see text], running in time [Formula: see text] at worst. Besides, this algorithm can be applied to [Formula: see text]-bisubmodular achieving [Formula: see text]-approximation running in time [Formula: see text]. Furthermore, if [Formula: see text] is multiple of 2, the approximation ratio can reduce to [Formula: see text], running in time [Formula: see text] at worst. For individual size constraint, there is a [Formula: see text]-approximation algorithm for maximizing a nonnegative [Formula: see text]-submodular function and an nonnegative [Formula: see text]-bisubmodular function, running in time [Formula: see text] and [Formula: see text] respectively, at worst.

Download Full-text

On the Euclidean Minimum Spanning Tree Problem

Computing Letters ◽

10.1163/1574040053326325 ◽

2005 ◽

Vol 1 (1) ◽

pp. 11-14 ◽

Cited By ~ 7

Author(s):

Sanguthevar Rajasekaran

Keyword(s):

Spanning Tree ◽

Euclidean Distance ◽

Minimum Spanning Tree ◽

Linear Time ◽

Randomized Algorithm ◽

Weighted Graph ◽

Deterministic Algorithm ◽

Probabilistic Algorithms ◽

Tree Algorithms ◽

Minimum Spanning Tree Problem

Given a weighted graph G(V;E), a minimum spanning tree for G can be obtained in linear time using a randomized algorithm or nearly linear time using a deterministic algorithm. Given n points in the plane, we can construct a graph with these points as nodes and an edge between every pair of nodes. The weight on any edge is the Euclidean distance between the two points. Finding a minimum spanning tree for this graph is known as the Euclidean minimum spanning tree problem (EMSTP). The minimum spanning tree algorithms alluded to before will run in time O(n2) (or nearly O(n2)) on this graph. In this note we point out that it is possible to devise simple algorithms for EMSTP in k- dimensions (for any constant k) whose expected run time is O(n), under the assumption that the points are uniformly distributed in the space of interest.CR Categories: F2.2 Nonnumerical Algorithms and Problems; G.3 Probabilistic Algorithms

Download Full-text

On maximizing a monotone k-submodular function under a knapsack constraint

Operations Research Letters ◽

10.1016/j.orl.2021.11.010 ◽

2022 ◽

Vol 50 (1) ◽

pp. 28-31

Author(s):

Zhongzheng Tang ◽

Chenhao Wang ◽

Hau Chan

Keyword(s):

Submodular Function ◽

Knapsack Constraint

Download Full-text

Approximation Algorithms for Scheduling Parallel Jobs: Breaking the Approximation Ratio of 2

Automata, Languages and Programming - Lecture Notes in Computer Science ◽

10.1007/978-3-540-70575-8_20 ◽

2008 ◽

pp. 234-245 ◽

Cited By ~ 3

Author(s):

Klaus Jansen ◽

Ralf Thöle

Keyword(s):

Approximation Algorithms ◽

Approximation Ratio ◽

Parallel Jobs

Download Full-text

Algorithms for Weighted Independent Transversals and Strong Colouring

ACM Transactions on Algorithms ◽

10.1145/3474057 ◽

2022 ◽

Vol 18 (1) ◽

pp. 1-16

Author(s):

Alessandra Graf ◽

David G. Harris ◽

Penny Haxell

Keyword(s):

Existence Theorems ◽

Sufficient Conditions ◽

Randomized Algorithm ◽

Independent Set ◽

Deterministic Algorithm ◽

Efficient Algorithms ◽

Combinatorial Problems ◽

Vertex Partition ◽

Partition Class ◽

Strong Chromatic Number

An independent transversal (IT) in a graph with a given vertex partition is an independent set consisting of one vertex in each partition class. Several sufficient conditions are known for the existence of an IT in a given graph and vertex partition, which have been used over the years to solve many combinatorial problems. Some of these IT existence theorems have algorithmic proofs, but there remains a gap between the best existential bounds and the bounds obtainable by efficient algorithms. Recently, Graf and Haxell (2018) described a new (deterministic) algorithm that asymptotically closes this gap, but there are limitations on its applicability. In this article, we develop a randomized algorithm that is much more widely applicable, and demonstrate its use by giving efficient algorithms for two problems concerning the strong chromatic number of graphs.

Download Full-text