EFFICIENT APPROXIMATION ALGORITHMS FOR PAIRWISE DATA CLUSTERING AND APPLICATIONS

Data clustering is an important theoretical topic and a sharp tool for various applications. It is a task frequently arising in geometric computing. The main objective of data clustering is to partition a given data set into clusters such that the data items within the same cluster are "more" similar to each other with respect to certain measures. In this paper, we study the pairwise data clustering problem with pairwise similarity/dissimilarity measures that need not satisfy the triangle inequality. By using a criterion, called the minimum normalized cut, we model the general pairwise data clustering problem as a graph partition problem. The graph partition problem based on minimizing the normalized cut is known to be NP-hard. For an undirected weighted graph of n vertices, we present a ((4+o(1)) In n)-approximation polynomial time algorithm for the minimum normalized cut problem; this is the first provably good approximation polynomial time algorithm for the problem. We also give a more efficient algorithm for this problem by sacrificing the approximation ratio slightly. Further, our scheme achieves a ((2+o(1)) In n)-approximation polynomial time algorithm for computing the sparsest cuts in edge-weighted and vertex-weighted undirected graphs, improving the previously best known approximation ratio by a constant factor. Some applications and implementation work of our approximation normalized cut algorithms are also discussed.

Download Full-text

An approximation polynomial-time algorithm for a cardinality-weighted 2-clustering problem

2017 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON) ◽

10.1109/sibircon.2017.8109845 ◽

2017 ◽

Author(s):

Alexander Kel'manov ◽

Anna Motkova

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Clustering Problem ◽

Approximation Polynomial

Download Full-text

An approximation polynomial-time algorithm for a sequence bi-clustering problem

Computational Mathematics and Mathematical Physics ◽

10.1134/s0965542515060068 ◽

2015 ◽

Vol 55 (6) ◽

pp. 1068-1076 ◽

Cited By ~ 7

Author(s):

A. V. Kel’manov ◽

S. A. Khamidullin

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Clustering Problem ◽

Approximation Polynomial

Download Full-text

A Polynomial Time Algorithm for Rayleigh Ratio on Discrete Variables: Replacing Spectral Techniques for Expander Ratio, Normalized Cut, and Cheeger Constant

Operations Research ◽

10.1287/opre.1120.1126 ◽

2013 ◽

Vol 61 (1) ◽

pp. 184-198 ◽

Cited By ~ 16

Author(s):

Dorit S. Hochbaum

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Discrete Variables ◽

Normalized Cut ◽

Spectral Techniques ◽

Cheeger Constant

Download Full-text

2-Approximation Polynomial-Time Algorithm for a Cardinality-Weighted 2-Partitioning Problem of a Sequence

Lecture Notes in Computer Science - Numerical Computations: Theory and Algorithms ◽

10.1007/978-3-030-40616-5_34 ◽

2020 ◽

pp. 386-393 ◽

Cited By ~ 1

Author(s):

Alexander Kel’manov ◽

Sergey Khamidullin ◽

Anna Panasenko

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Partitioning Problem ◽

Approximation Polynomial

Download Full-text

Order Statistics for Probabilistic Graphical Models

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/645 ◽

2017 ◽

Author(s):

David Smith ◽

Sara Rouhani ◽

Vibhav Gogate

Keyword(s):

Order Statistics ◽

Graphical Models ◽

Polynomial Time ◽

Graphical Model ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Partition Problem ◽

Polynomial Time Approximation Algorithm ◽

Sampling Algorithms ◽

Pseudo Polynomial Time

We consider the problem of computing r-th order statistics, namely finding an assignment having rank r in a probabilistic graphical model. We show that the problem is NP-hard even when the graphical model has no edges (zero-treewidth models) via a reduction from the partition problem. We use this reduction, specifically a pseudo-polynomial time algorithm for number partitioning to yield a pseudo-polynomial time approximation algorithm for solving the r-th order statistics problem in zero- treewidth models. We then extend this algorithm to arbitrary graphical models by generalizing it to tree decompositions, and demonstrate via experimental evaluation on various datasets that our proposed algorithm is more accurate than sampling algorithms.

Download Full-text

An improved primal-dual approximation algorithm for the k-means problem with penalties

Mathematical Structures in Computer Science ◽

10.1017/s0960129521000104 ◽

2021 ◽

pp. 1-13

Author(s):

Chunying Ren ◽

Dachuan Xu ◽

Donglei Du ◽

Min Li

Keyword(s):

Approximation Algorithm ◽

Polynomial Time ◽

Best Approximation ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Approximation Ratio ◽

Data Set ◽

Penalty Cost ◽

Primal Dual ◽

Total Penalty

Abstract In the k-means problem with penalties, we are given a data set ${\cal D} \subseteq \mathbb{R}^\ell $ of n points where each point $j \in {\cal D}$ is associated with a penalty cost p j and an integer k. The goal is to choose a set ${\rm{C}}S \subseteq {{\cal R}^\ell }$ with |CS| ≤ k and a penalized subset ${{\cal D}_p} \subseteq {\cal D}$ to minimize the sum of the total squared distance from the points in D / D p to CS and the total penalty cost of points in D p , namely $\sum\nolimits_{j \in {\cal D}\backslash {{\cal D}_p}} {d^2}(j,{\rm{C}}S) + \sum\nolimits_{j \in {{\cal D}_p}} {p_j}$ . We employ the primal-dual technique to give a pseudo-polynomial time algorithm with an approximation ratio of (6.357+ε) for the k-means problem with penalties, improving the previous best approximation ratio 19.849+∊ for this problem given by Feng et al. in Proceedings of FAW (2019).

Download Full-text

PARTITIONING TREES OF SUPPLY AND DEMAND

International Journal of Foundations of Computer Science ◽

10.1142/s0129054105003303 ◽

2005 ◽

Vol 16 (04) ◽

pp. 803-827 ◽

Cited By ~ 21

Author(s):

TAKEHIRO ITO ◽

XIAO ZHOU ◽

TAKAO NISHIZEKI

Keyword(s):

Polynomial Time ◽

Approximation Scheme ◽

Linear Time ◽

Supply And Demand ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Partition Problem ◽

Polynomial Time Approximation Scheme ◽

Demand Vertex ◽

Supply Vertex

Assume that a tree T has a number ns of "supply vertices" and all the other vertices are "demand vertices." Each supply vertex is assigned a positive number called a supply, while each demand vertex is assigned a positive number called a demand. One wishes to partition T into exactly ns subtrees by deleting edges from T so that each subtree contains exactly one supply vertex whose supply is no less than the sum of demands of all demand vertices in the subtree. The "partition problem" is a decision problem to ask whether T has such a partition. The "maximum partition problem" is an optimization version of the partition problem. In this paper, we give three algorithms for the problems. The first is a linear-time algorithm for the partition problem. The second is a pseudo-polynomial-time algorithm for the maximum partition problem. The third is a fully polynomial-time approximation scheme (FPTAS) for the maximum partition problem.

Download Full-text

1/2-Approximation polynomial-time algorithm for a problem of searching a subset

2017 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON) ◽

10.1109/sibircon.2017.8109827 ◽

2017 ◽

Cited By ~ 1

Author(s):

Alexander Ageev ◽

Alexander Kel'manov ◽

Artem Pyatkin ◽

Sergey Khamidullin ◽

Vladimir Shenmaier

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Approximation Polynomial

Download Full-text

Learning Importance of Preferences

10.29007/v68w ◽

2018 ◽

Author(s):

Ying Zhu ◽

Mirek Truszczynski

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Scoring Rules ◽

Np Hard ◽

Individual Preferences

We study the problem of learning the importance of preferences in preference profiles in two important cases: when individual preferences are aggregated by the ranked Pareto rule, and when they are aggregated by positional scoring rules. For the ranked Pareto rule, we provide a polynomial-time algorithm that finds a ranking of preferences such that the ranked profile correctly decides all the examples, whenever such a ranking exists. We also show that the problem to learn a ranking maximizing the number of correctly decided examples (also under the ranked Pareto rule) is NP-hard. We obtain similar results for the case of weighted profiles when positional scoring rules are used for aggregation.

Download Full-text

A Polynomial-Time Algorithm for Minimizing the Deep Coalescence Cost for Level-1 Species Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3105922 ◽

2021 ◽

pp. 1-1

Author(s):

Matthew Lemay ◽

Ran Libeskind-Hadas ◽

Yi-Chieh Wu

Keyword(s):

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Deep Coalescence ◽

Level 1

Download Full-text