scholarly journals Active Selection Constraints for Semi-supervised Clustering Algorithms

Author(s):  
Walid Atwa ◽  
◽  
Abdulwahab Ali Almazroi

Semi.-supervised clustering algorithms aim to enhance the performance of clustering using the pairwise constraints. However, selecting these constraints randomly or improperly can minimize the performance of clustering in certain situations and with different applications. In this paper, we select the most informative constraints to improve semi-supervised clustering algorithms. We present an active selection of constraints, including active must.-link (AML) and active cannot.-link (ACL) constraints. Based on Radial-Bases Function, we compute lower-bound and upper-bound between data points to select the constraints that improve the performance. We test the proposed algorithm with the base-line methods and show that our proposed active pairwise constraints outperform other algorithms.

2013 ◽  
Vol 2013 ◽  
pp. 1-6
Author(s):  
Yan Sun ◽  
Shuxue Ding

The Wu-Huberman clustering is a typical linear algorithm among many clustering algorithms, which illustrates data points relationship as an artificial “circuit” and then applies the Kirchhoff equations to get the voltage value on the complex circuit. However, the performance of the algorithm is crucially dependent on the selection of pole points. In this paper, we present a novel pole point selection strategy for the Wu-Huberman algorithm (named as PSWH algorithm), which aims at preserving the merit and increasing the robustness of the algorithm. The pole point selection strategy is proposed to filter the pole point by introducing sparse rate. Experiments results demonstrate that the PSWH algorithm is significantly improved in clustering accuracy and efficiency compared with the original Wu-Huberman algorithm.


Author(s):  
Bojun Yan

As a recent emerging technique, semi-supervised clustering has attracted significant research interest. Compared to traditional clustering algorithms, which only use unlabeled data, semi-supervised clustering employs both unlabeled and supervised data to obtain a partitioning that conforms more closely to the user’s preferences. Several recent papers have discussed this problem (Cohn, Caruana, & McCallum, 2003; Bar- Hillel, Hertz, Shental, & Weinshall, 2003; Xing, Ng, Jordan, & Russell, 2003; Basu, Bilenko, & Mooney, 2004; Kulis, Dhillon, & Mooney, 2005). In semi-supervised clustering, limited supervision is provided as input. The supervision can have the form of labeled data or pairwise constraints. In many applications it is natural to assume that pairwise constraints are available (Bar-Hillel, Hertz, Shental, & Weinshall, 2003; Wagstaff, Cardie, Rogers, & Schroedl, 2001). For example, in protein interaction and gene expression data (Segal, Wang, & Koller, 2003), pairwise constraints can be derived from the background domain knowledge. Similarly, in information and image retrieval, it is easy for the user to provide feedback concerning a qualitative measure of similarity or dissimilarity between pairs of objects. Thus, in these cases, although class labels may be unknown, a user can still specify whether pairs of points belong to the same cluster (Must-Link) or to different ones (Cannot-Link). Furthermore, a set of classified points implies an equivalent set of pairwise constraints, but not vice versa. Recently, a kernel method for semi-supervised clustering has been introduced (Kulis, Dhillon, & Mooney, 2005). This technique extends semi-supervised clustering to a kernel space, thus enabling the discovery of clusters with non-linear boundaries in input space. While a powerful technique, the applicability of a kernel-based semi-supervised clustering approach is limited in practice, due to the critical settings of kernel’s parameters. In fact, the chosen parameter values can largely affect the quality of the results. While solutions have been proposed in supervised learning to estimate the optimal kernel’s parameters, the problem presents open challenges when no labeled data are provided, and all we have available is a set of pairwise constraints.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


1998 ◽  
Vol 58 (1) ◽  
pp. 1-13 ◽  
Author(s):  
Shiqing Zhang

Using the equivariant Ljusternik-Schnirelmann theory and the estimate of the upper bound of the critical value and lower bound for the collision solutions, we obtain some new results in the large concerning multiple geometrically distinct periodic solutions of fixed energy for a class of planar N-body type problems.


2016 ◽  
Vol 26 (12) ◽  
pp. 1650204 ◽  
Author(s):  
Jihua Yang ◽  
Liqin Zhao

This paper deals with the limit cycle bifurcations for piecewise smooth Hamiltonian systems. By using the first order Melnikov function of piecewise near-Hamiltonian systems given in [Liu & Han, 2010], we give a lower bound and an upper bound of the number of limit cycles that bifurcate from the period annulus between the center and the generalized eye-figure loop up to the first order of Melnikov function.


Author(s):  
E. S. Barnes

Letbe n linear forms with real coefficients and determinant Δ = ∥ aij∥ ≠ 0; and denote by M(X) the lower bound of | X1X2 … Xn| over all integer sets (u) ≠ (0). It is well known that γn, the upper bound of M(X)/|Δ| over all sets of forms Xi, is finite, and the value of γn has been determined when n = 2 and n = 3.


2010 ◽  
Vol 47 (03) ◽  
pp. 611-629
Author(s):  
Mark Fackrell ◽  
Qi-Ming He ◽  
Peter Taylor ◽  
Hanqin Zhang

This paper is concerned with properties of the algebraic degree of the Laplace-Stieltjes transform of phase-type (PH) distributions. The main problem of interest is: given a PH generator, how do we find the maximum and the minimum algebraic degrees of all irreducible PH representations with that PH generator? Based on the matrix exponential (ME) order of ME distributions and the spectral polynomial algorithm, a method for computing the algebraic degree of a PH distribution is developed. The maximum algebraic degree is identified explicitly. Using Perron-Frobenius theory of nonnegative matrices, a lower bound and an upper bound on the minimum algebraic degree are found, subject to some conditions. Explicit results are obtained for special cases.


Algorithmica ◽  
2021 ◽  
Author(s):  
Seungbum Jo ◽  
Rahul Lingala ◽  
Srinivasa Rao Satti

AbstractWe consider the problem of encoding two-dimensional arrays, whose elements come from a total order, for answering $${\text{Top-}}{k}$$ Top- k queries. The aim is to obtain encodings that use space close to the information-theoretic lower bound, which can be constructed efficiently. For an $$m \times n$$ m × n array, with $$m \le n$$ m ≤ n , we first propose an encoding for answering 1-sided $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries, whose query range is restricted to $$[1 \dots m][1 \dots a]$$ [ 1 ⋯ m ] [ 1 ⋯ a ] , for $$1 \le a \le n$$ 1 ≤ a ≤ n . Next, we propose an encoding for answering for the general (4-sided) $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries that takes $$(m\lg {{(k+1)n \atopwithdelims ()n}}+2nm(m-1)+o(n))$$ ( m lg ( k + 1 ) n n + 2 n m ( m - 1 ) + o ( n ) ) bits, which generalizes the joint Cartesian tree of Golin et al. [TCS 2016]. Compared with trivial $$O(nm\lg {n})$$ O ( n m lg n ) -bit encoding, our encoding takes less space when $$m = o(\lg {n})$$ m = o ( lg n ) . In addition to the upper bound results for the encodings, we also give lower bounds on encodings for answering 1 and 4-sided $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries, which show that our upper bound results are almost optimal.


2014 ◽  
Vol 797 ◽  
pp. 117-122 ◽  
Author(s):  
Carolina Bermudo ◽  
F. Martín ◽  
Lorenzo Sevilla

It has been established, in previous studies, the best adaptation and solution for the implementation of the modular model, being the current choice based on the minimization of the p/2k dimensionless relation obtained for each one of the model, analyzed under the same boundary conditions and efforts. Among the different cases covered, this paper shows the study for the optimal choice of the geometric distribution of zones. The Upper Bound Theorem (UBT) by its Triangular Rigid Zones (TRZ) consideration, under modular distribution, is applied to indentation processes. To extend the application of the model, cases of different thicknesses are considered


Sign in / Sign up

Export Citation Format

Share Document