Session details: Data summarization

Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Proceedings of the ACM on Measurement and Analysis of Computing Systems ◽

10.1145/3447383 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-31

Author(s):

Kai Han ◽

Shuang Cui ◽

Tianshuai Zhu ◽

Enpei Zhang ◽

Benwei Wu ◽

...

Keyword(s):

Approximation Algorithms ◽

Fundamental Problem ◽

Randomized Algorithm ◽

Deterministic Algorithm ◽

Submodular Function ◽

Approximation Ratio ◽

Performance Bounds ◽

Data Summarization ◽

Submodular Optimization ◽

Knapsack Constraint

Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.

On Data Summarization for Machine Learning in Multi-organization Federations

2019 IEEE International Conference on Smart Computing (SMARTCOMP) ◽

10.1109/smartcomp.2019.00030 ◽

2019 ◽

Cited By ~ 1

Author(s):

Bong Jun Ko ◽

Shiqiang Wang ◽

Ting He ◽

Dave Conway-Jones

Keyword(s):

Machine Learning ◽

Data Summarization

Hierarchical Data Summarization

Encyclopedia of Database Systems ◽

10.1007/978-0-387-39940-9_536 ◽

2009 ◽

pp. 1300-1304

Author(s):

Egemen Tanin

Keyword(s):

Hierarchical Data ◽

Data Summarization

Walk-Based Diversification for Data Summarization

Advances in Intelligent Systems and Computing - Information Technology and Systems ◽

10.1007/978-3-030-40690-5_15 ◽

2020 ◽

pp. 152-161

Author(s):

Samuel Zanferdini Oliva ◽

Joaquim Cezar Felipe

Keyword(s):

Data Summarization

Data summarization for heterogeneous infrastructure using spike-based monitoring technique

Bridge Maintenance, Safety, Management and Life Extension ◽

10.1201/b17063-85 ◽

2014 ◽

pp. 585-590

Author(s):

G Sundaresan ◽

L Wu ◽

H Yun ◽

K Park ◽

J Kim

Keyword(s):

Data Summarization ◽

Monitoring Technique

Data Summarization Based Fast Hierarchical Clustering Method for Large Datasets

2009 International Conference on Information Management and Engineering ◽

10.1109/icime.2009.65 ◽

2009 ◽

Cited By ~ 3

Author(s):

Bidyut Kr. Patra ◽

Sukumar Nandi ◽

P. Viswanath

Keyword(s):

Hierarchical Clustering ◽

Large Datasets ◽

Clustering Method ◽

Data Summarization

Fast Machine Learning in Data Science with a Comprehensive Data Summarization

10.1109/bigdata52589.2021.9671356 ◽

2021 ◽

Author(s):

Sikder Tahsin Al-Amin ◽

Carlos Ordonez

Keyword(s):

Machine Learning ◽

Data Science ◽

Data Summarization ◽

Comprehensive Data

Data summarization for hyperspectral image analysis

Algorithms, Technologies, and Applications for Multispectral and Hyperspectral Imaging XXVII ◽

10.1117/12.2590762 ◽

2021 ◽

Author(s):

Maher Aldeghlawi ◽

Mohammed Q. Alkhatib ◽

Miguel Velez-Reyes

Keyword(s):

Image Analysis ◽

Hyperspectral Image ◽

Data Summarization ◽

Hyperspectral Image Analysis

On Deriving Data Summarization through Ontologies to Meet User Preferences

Advances in Data Management - Studies in Computational Intelligence ◽

10.1007/978-3-642-02190-9_4 ◽

2009 ◽

pp. 67-87

Author(s):

Troels Andreasen ◽

Henrik Bulskov

Keyword(s):

User Preferences ◽

Data Summarization

From User Requirements to Querying of Fuzzy Summaries

International Journal of Service Science Management Engineering and Technology ◽

10.4018/ijssmet.2014010105 ◽

2014 ◽

Vol 5 (1) ◽

pp. 84-102 ◽

Cited By ~ 1

Author(s):

Ines Benali Sougui ◽

Minyar Sassi Hidri ◽

Amel Grissa Touzi

Keyword(s):

User Requirements ◽

Fuzzy Data ◽

Huge Amount ◽

Data Summarization ◽

Precise Data

With the huge amount and the evolution of fuzzy data, the necessity to work with synthetic views became a challenge for many databases (DB) community researchers. Data summarization techniques are now considered as accurate tools to handle huge DB, in particular when precise data are not needed. Formal approaches have been proposed making possible the generation of an hierarchy of summaries from DB. The challenges arise on the question of how querying these fuzzy views according user requirements. In this work, we propose to handle with these challenges by query repairing and substitution. Two process were studied, the first process is used by modifying query while using the best fuzzy summaries which have the most near answers. The second one is applied to generate all substitution queries over the fuzzy summaries' hierarchy. This would be not only expensive but also unjustified for the part of the search hierarchy nodes.