EFFICIENT APPROXIMATION ALGORITHMS FOR PAIRWISE DATA CLUSTERING AND APPLICATIONS

2004 ◽  
Vol 14 (01n02) ◽  
pp. 85-104 ◽  
Author(s):  
XIAODONG WU ◽  
DANNY Z. CHEN ◽  
JAMES J. MASON ◽  
STEVEN R. SCHMID

Data clustering is an important theoretical topic and a sharp tool for various applications. It is a task frequently arising in geometric computing. The main objective of data clustering is to partition a given data set into clusters such that the data items within the same cluster are "more" similar to each other with respect to certain measures. In this paper, we study the pairwise data clustering problem with pairwise similarity/dissimilarity measures that need not satisfy the triangle inequality. By using a criterion, called the minimum normalized cut, we model the general pairwise data clustering problem as a graph partition problem. The graph partition problem based on minimizing the normalized cut is known to be NP-hard. For an undirected weighted graph of n vertices, we present a ((4+o(1)) In n)-approximation polynomial time algorithm for the minimum normalized cut problem; this is the first provably good approximation polynomial time algorithm for the problem. We also give a more efficient algorithm for this problem by sacrificing the approximation ratio slightly. Further, our scheme achieves a ((2+o(1)) In n)-approximation polynomial time algorithm for computing the sparsest cuts in edge-weighted and vertex-weighted undirected graphs, improving the previously best known approximation ratio by a constant factor. Some applications and implementation work of our approximation normalized cut algorithms are also discussed.

Author(s):  
David Smith ◽  
Sara Rouhani ◽  
Vibhav Gogate

We consider the problem of computing r-th order statistics, namely finding an assignment having rank r in a probabilistic graphical model. We show that the problem is NP-hard even when the graphical model has no edges (zero-treewidth models) via a reduction from the partition problem. We use this reduction, specifically a pseudo-polynomial time algorithm for number partitioning to yield a pseudo-polynomial time approximation algorithm for solving the r-th order statistics problem in zero- treewidth models. We then extend this algorithm to arbitrary graphical models by generalizing it to tree decompositions, and demonstrate via experimental evaluation on various datasets that our proposed algorithm is more accurate than sampling algorithms.


Author(s):  
Chunying Ren ◽  
Dachuan Xu ◽  
Donglei Du ◽  
Min Li

Abstract In the k-means problem with penalties, we are given a data set ${\cal D} \subseteq \mathbb{R}^\ell $ of n points where each point $j \in {\cal D}$ is associated with a penalty cost p j and an integer k. The goal is to choose a set ${\rm{C}}S \subseteq {{\cal R}^\ell }$ with |CS| ≤ k and a penalized subset ${{\cal D}_p} \subseteq {\cal D}$ to minimize the sum of the total squared distance from the points in D / D p to CS and the total penalty cost of points in D p , namely $\sum\nolimits_{j \in {\cal D}\backslash {{\cal D}_p}} {d^2}(j,{\rm{C}}S) + \sum\nolimits_{j \in {{\cal D}_p}} {p_j}$ . We employ the primal-dual technique to give a pseudo-polynomial time algorithm with an approximation ratio of (6.357+ε) for the k-means problem with penalties, improving the previous best approximation ratio 19.849+∊ for this problem given by Feng et al. in Proceedings of FAW (2019).


2005 ◽  
Vol 16 (04) ◽  
pp. 803-827 ◽  
Author(s):  
TAKEHIRO ITO ◽  
XIAO ZHOU ◽  
TAKAO NISHIZEKI

Assume that a tree T has a number ns of "supply vertices" and all the other vertices are "demand vertices." Each supply vertex is assigned a positive number called a supply, while each demand vertex is assigned a positive number called a demand. One wishes to partition T into exactly ns subtrees by deleting edges from T so that each subtree contains exactly one supply vertex whose supply is no less than the sum of demands of all demand vertices in the subtree. The "partition problem" is a decision problem to ask whether T has such a partition. The "maximum partition problem" is an optimization version of the partition problem. In this paper, we give three algorithms for the problems. The first is a linear-time algorithm for the partition problem. The second is a pseudo-polynomial-time algorithm for the maximum partition problem. The third is a fully polynomial-time approximation scheme (FPTAS) for the maximum partition problem.


10.29007/v68w ◽  
2018 ◽  
Author(s):  
Ying Zhu ◽  
Mirek Truszczynski

We study the problem of learning the importance of preferences in preference profiles in two important cases: when individual preferences are aggregated by the ranked Pareto rule, and when they are aggregated by positional scoring rules. For the ranked Pareto rule, we provide a polynomial-time algorithm that finds a ranking of preferences such that the ranked profile correctly decides all the examples, whenever such a ranking exists. We also show that the problem to learn a ranking maximizing the number of correctly decided examples (also under the ranked Pareto rule) is NP-hard. We obtain similar results for the case of weighted profiles when positional scoring rules are used for aggregation.


Sign in / Sign up

Export Citation Format

Share Document