Active Selection Constraints for Semi-supervised Clustering Algorithms

Semi.-supervised clustering algorithms aim to enhance the performance of clustering using the pairwise constraints. However, selecting these constraints randomly or improperly can minimize the performance of clustering in certain situations and with different applications. In this paper, we select the most informative constraints to improve semi-supervised clustering algorithms. We present an active selection of constraints, including active must.-link (AML) and active cannot.-link (ACL) constraints. Based on Radial-Bases Function, we compute lower-bound and upper-bound between data points to select the constraints that improve the performance. We test the proposed algorithm with the base-line methods and show that our proposed active pairwise constraints outperform other algorithms.

Download Full-text

An Enhanced Wu-Huberman Algorithm with Pole Point Selection Strategy

Abstract and Applied Analysis ◽

10.1155/2013/589386 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6

Author(s):

Yan Sun ◽

Shuxue Ding

Keyword(s):

Clustering Algorithms ◽

Selection Strategy ◽

Point Selection ◽

Kirchhoff Equations ◽

Linear Algorithm ◽

Data Points ◽

Pole Point ◽

Selection Of

The Wu-Huberman clustering is a typical linear algorithm among many clustering algorithms, which illustrates data points relationship as an artificial “circuit” and then applies the Kirchhoff equations to get the voltage value on the complex circuit. However, the performance of the algorithm is crucially dependent on the selection of pole points. In this paper, we present a novel pole point selection strategy for the Wu-Huberman algorithm (named as PSWH algorithm), which aims at preserving the merit and increasing the robustness of the algorithm. The pole point selection strategy is proposed to filter the pole point by introducing sparse rate. Experiments results demonstrate that the PSWH algorithm is significantly improved in clustering accuracy and efficiency compared with the original Wu-Huberman algorithm.

Download Full-text

Learning Kernels for Semi-Supervised Clustering

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch177 ◽

2011 ◽

pp. 1142-1145

Author(s):

Bojun Yan

Keyword(s):

Domain Knowledge ◽

Kernel Method ◽

Clustering Algorithms ◽

Pairwise Constraints ◽

Supervised Clustering ◽

Significant Research ◽

Clustering Approach ◽

Class Labels ◽

Qualitative Measure ◽

Parameter Values

As a recent emerging technique, semi-supervised clustering has attracted significant research interest. Compared to traditional clustering algorithms, which only use unlabeled data, semi-supervised clustering employs both unlabeled and supervised data to obtain a partitioning that conforms more closely to the user’s preferences. Several recent papers have discussed this problem (Cohn, Caruana, & McCallum, 2003; Bar- Hillel, Hertz, Shental, & Weinshall, 2003; Xing, Ng, Jordan, & Russell, 2003; Basu, Bilenko, & Mooney, 2004; Kulis, Dhillon, & Mooney, 2005). In semi-supervised clustering, limited supervision is provided as input. The supervision can have the form of labeled data or pairwise constraints. In many applications it is natural to assume that pairwise constraints are available (Bar-Hillel, Hertz, Shental, & Weinshall, 2003; Wagstaff, Cardie, Rogers, & Schroedl, 2001). For example, in protein interaction and gene expression data (Segal, Wang, & Koller, 2003), pairwise constraints can be derived from the background domain knowledge. Similarly, in information and image retrieval, it is easy for the user to provide feedback concerning a qualitative measure of similarity or dissimilarity between pairs of objects. Thus, in these cases, although class labels may be unknown, a user can still specify whether pairs of points belong to the same cluster (Must-Link) or to different ones (Cannot-Link). Furthermore, a set of classified points implies an equivalent set of pairwise constraints, but not vice versa. Recently, a kernel method for semi-supervised clustering has been introduced (Kulis, Dhillon, & Mooney, 2005). This technique extends semi-supervised clustering to a kernel space, thus enabling the discovery of clusters with non-linear boundaries in input space. While a powerful technique, the applicability of a kernel-based semi-supervised clustering approach is limited in practice, due to the critical settings of kernel’s parameters. In fact, the chosen parameter values can largely affect the quality of the results. While solutions have been proposed in supervised learning to estimate the optimal kernel’s parameters, the problem presents open challenges when no labeled data are provided, and all we have available is a set of pairwise constraints.

Download Full-text

Detection of Buried Targets via Active Selection of Labeled Data: Application to Sensing Subsurface UXO

10.21236/ada520344 ◽

2007 ◽

Author(s):

Lawrence Carin

Keyword(s):

Active Selection ◽

Data Application ◽

Selection Of

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text

Multiple closed orbits for N-body-type problems

Bulletin of the Australian Mathematical Society ◽

10.1017/s0004972700031968 ◽

1998 ◽

Vol 58 (1) ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Shiqing Zhang

Keyword(s):

Lower Bound ◽

Periodic Solutions ◽

Upper Bound ◽

Critical Value ◽

Body Type ◽

Fixed Energy ◽

Closed Orbits

Using the equivariant Ljusternik-Schnirelmann theory and the estimate of the upper bound of the critical value and lower bound for the collision solutions, we obtain some new results in the large concerning multiple geometrically distinct periodic solutions of fixed energy for a class of planar N-body type problems.

Download Full-text

Limit Cycle Bifurcations for Piecewise Smooth Hamiltonian Systems with a Generalized Eye-Figure Loop

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127416502047 ◽

2016 ◽

Vol 26 (12) ◽

pp. 1650204 ◽

Cited By ~ 6

Author(s):

Jihua Yang ◽

Liqin Zhao

Keyword(s):

Lower Bound ◽

Limit Cycle ◽

Hamiltonian Systems ◽

Limit Cycles ◽

Upper Bound ◽

Melnikov Function ◽

Piecewise Smooth ◽

First Order ◽

Period Annulus

This paper deals with the limit cycle bifurcations for piecewise smooth Hamiltonian systems. By using the first order Melnikov function of piecewise near-Hamiltonian systems given in [Liu & Han, 2010], we give a lower bound and an upper bound of the number of limit cycles that bifurcate from the period annulus between the center and the generalized eye-figure loop up to the first order of Melnikov function.

Download Full-text

Isolated minima of the product of n linear forms

Mathematical Proceedings of the Cambridge Philosophical Society ◽

10.1017/s0305004100028048 ◽

1953 ◽

Vol 49 (1) ◽

pp. 59-62 ◽

Cited By ~ 6

Author(s):

E. S. Barnes

Keyword(s):

Lower Bound ◽

Upper Bound ◽

Linear Forms ◽

Image Position

Letbe n linear forms with real coefficients and determinant Δ = ∥ aij∥ ≠ 0; and denote by M(X) the lower bound of | X1X2 … Xn| over all integer sets (u) ≠ (0). It is well known that γn, the upper bound of M(X)/|Δ| over all sets of forms Xi, is finite, and the value of γn has been determined when n = 2 and n = 3.

Download Full-text

The Algebraic Degree of Phase-Type Distributions

Journal of Applied Probability ◽

10.1017/s0021900200006963 ◽

2010 ◽

Vol 47 (03) ◽

pp. 611-629

Author(s):

Mark Fackrell ◽

Qi-Ming He ◽

Peter Taylor ◽

Hanqin Zhang

Keyword(s):

Lower Bound ◽

Upper Bound ◽

Polynomial Algorithm ◽

Algebraic Degree ◽

Nonnegative Matrices ◽

Stieltjes Transform ◽

Special Cases ◽

Phase Type ◽

Phase Type Distributions ◽

The Matrix

This paper is concerned with properties of the algebraic degree of the Laplace-Stieltjes transform of phase-type (PH) distributions. The main problem of interest is: given a PH generator, how do we find the maximum and the minimum algebraic degrees of all irreducible PH representations with that PH generator? Based on the matrix exponential (ME) order of ME distributions and the spectral polynomial algorithm, a method for computing the algebraic degree of a PH distribution is developed. The maximum algebraic degree is identified explicitly. Using Perron-Frobenius theory of nonnegative matrices, a lower bound and an upper bound on the minimum algebraic degree are found, subject to some conditions. Explicit results are obtained for special cases.

Download Full-text

Encoding Two-Dimensional Range Top-k Queries

Algorithmica ◽

10.1007/s00453-021-00856-1 ◽

2021 ◽

Author(s):

Seungbum Jo ◽

Rahul Lingala ◽

Srinivasa Rao Satti

Keyword(s):

Lower Bound ◽

Lower Bounds ◽

Upper Bound ◽

Total Order ◽

Two Dimensional ◽

Information Theoretic ◽

Cartesian Tree ◽

Dimensional Range

AbstractWe consider the problem of encoding two-dimensional arrays, whose elements come from a total order, for answering $${\text{Top-}}{k}$$ Top- k queries. The aim is to obtain encodings that use space close to the information-theoretic lower bound, which can be constructed efficiently. For an $$m \times n$$ m × n array, with $$m \le n$$ m ≤ n , we first propose an encoding for answering 1-sided $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries, whose query range is restricted to $$[1 \dots m][1 \dots a]$$ [ 1 ⋯ m ] [ 1 ⋯ a ] , for $$1 \le a \le n$$ 1 ≤ a ≤ n . Next, we propose an encoding for answering for the general (4-sided) $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries that takes $$(m\lg {{(k+1)n \atopwithdelims ()n}}+2nm(m-1)+o(n))$$ ( m lg ( k + 1 ) n n + 2 n m ( m - 1 ) + o ( n ) ) bits, which generalizes the joint Cartesian tree of Golin et al. [TCS 2016]. Compared with trivial $$O(nm\lg {n})$$ O ( n m lg n ) -bit encoding, our encoding takes less space when $$m = o(\lg {n})$$ m = o ( lg n ) . In addition to the upper bound results for the encodings, we also give lower bounds on encodings for answering 1 and 4-sided $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries, which show that our upper bound results are almost optimal.

Download Full-text

Selection of the Optimal Distribution for the Upper Bound Theorem in Indentation Processes

Materials Science Forum ◽

10.4028/www.scientific.net/msf.797.117 ◽

2014 ◽

Vol 797 ◽

pp. 117-122 ◽

Cited By ~ 3

Author(s):

Carolina Bermudo ◽

F. Martín ◽

Lorenzo Sevilla

Keyword(s):

Upper Bound ◽

Geometric Distribution ◽

Optimal Choice ◽

Optimal Distribution ◽

Upper Bound Theorem ◽

Modular Model ◽

Bound Theorem ◽

Rigid Zones ◽

Dimensionless Relation ◽

Selection Of

It has been established, in previous studies, the best adaptation and solution for the implementation of the modular model, being the current choice based on the minimization of the p/2k dimensionless relation obtained for each one of the model, analyzed under the same boundary conditions and efforts. Among the different cases covered, this paper shows the study for the optimal choice of the geometric distribution of zones. The Upper Bound Theorem (UBT) by its Triangular Rigid Zones (TRZ) consideration, under modular distribution, is applied to indentation processes. To extend the application of the model, cases of different thicknesses are considered

Download Full-text