scholarly journals Robust Multi-view Learning via Half-quadratic Minimization

Author(s):  
Yonghua Zhu ◽  
Xiaofeng Zhu ◽  
Wei Zheng

Although multi-view clustering is capable to usemore information than single view clustering, existing multi-view clustering methods still have issues to be addressed, such as initialization sensitivity, the specification of the number of clusters,and the influence of outliers. In this paper, we propose a robust multi-view clustering method to address these issues. Specifically, we first propose amulti-view based sum-of-square error estimation tomake the initialization easy and simple as well asuse a sum-of-norm regularization to automaticallylearn the number of clusters according to data distribution. We further employ robust estimators constructed by the half-quadratic theory to avoid theinfluence of outliers for conducting robust estimations of both sum-of-square error and the numberof clusters. Experimental results on both syntheticand real datasets demonstrate that our method outperforms the state-of-the-art methods.  

Author(s):  
Nurshazwani Muhamad Mahfuz ◽  
Marina Yusoff ◽  
Zakiah Ahmad

<div style="’text-align: justify;">Clustering provides a prime important role as an unsupervised learning method in data analytics to assist many real-world problems such as image segmentation, object recognition or information retrieval. It is often an issue of difficulty for traditional clustering technique due to non-optimal result exist because of the presence of outliers and noise data.  This review paper provides a review of single clustering methods that were applied in various domains.  The aim is to see the potential suitable applications and aspect of improvement of the methods. Three categories of single clustering methods were suggested, and it would be beneficial to the researcher to see the clustering aspects as well as to determine the requirement for clustering method for an employment based on the state of the art of the previous research findings.</div>


2020 ◽  
Vol 34 (04) ◽  
pp. 4412-4419 ◽  
Author(s):  
Zhao Kang ◽  
Wangtao Zhou ◽  
Zhitong Zhao ◽  
Junming Shao ◽  
Meng Han ◽  
...  

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.


2020 ◽  
Vol 1 (1) ◽  
pp. 57-67
Author(s):  
Steven Pranata ◽  
Derry Alamsyah

 Segmentation divides an image into parts or segments that are simpler and more meaningful so they can be analyzed further. The solution that has been found is using the Maximum Likelihood Estimation (MLE) method and the Gausian Mixture Model. GMM is a clustering method. GMM is a function consisting of several Gaussian, each identified by k ∈ {1, ..., K}, where K is the number of clusters in our dataset. Maximum Likelihood estimation is a technique used to find a certain point to maximize a function, this technique is very widely used in estimating a data distribution parameter. Tests carried out using mango images with 10 different backgrounds. GMM will cluster the pixels of the mango image to produce averages and covariates. Then the average and covariance will be used by MLE to qualify each pixel of the mango image. In this study GMM and MLE tests were carried out to segment mangoes. Based on the results obtained, the GMM and MLE methods have  an error rate of 13.07% for 3 clusters, 8.06% for 4 clusters, and 6.63% for 5 clusters and good cluster quality with silhouette coefficient values ​​of 0.37686 for 3 clusters, 0.29577 for 4 clusters, and 0.26162 for 5 clusters.


2021 ◽  
Author(s):  
Faizan Ur Rahman ◽  
Soosan Beheshti

Transforming data to feature space using a kernel function can result in better expression of its features, resulting in better separability for some datasets. The parameters of the kernel function govern the structure of data in feature space and need to be optimized simultaneously while also estimating the number of clusters in a dataset. The proposed method denoted by kernel k-Minimum Average Central Error (kernel k-MACE), esti- mates the number of clusters in a dataset while simultaneously clustering the dataset in feature space by finding the optimum value of the Gaussian kernel parameter σk. A cluster initialization technique has also been proposed based on an existing method for k-means clustering. Simulations show that for self-generated datasets with Gaus- sian clusters having 10% - 50% overlap and for real benchmark datasets, the proposed method outperforms multiple state-of-the-art unsupervised clustering methods including k-MACE, the clustering scheme that inspired kernel k-MACE.


2021 ◽  
Author(s):  
Faizan Ur Rahman ◽  
Soosan Beheshti

Transforming data to feature space using a kernel function can result in better expression of its features, resulting in better separability for some datasets. The parameters of the kernel function govern the structure of data in feature space and need to be optimized simultaneously while also estimating the number of clusters in a dataset. The proposed method denoted by kernel k-Minimum Average Central Error (kernel k-MACE), esti- mates the number of clusters in a dataset while simultaneously clustering the dataset in feature space by finding the optimum value of the Gaussian kernel parameter σk. A cluster initialization technique has also been proposed based on an existing method for k-means clustering. Simulations show that for self-generated datasets with Gaus- sian clusters having 10% - 50% overlap and for real benchmark datasets, the proposed method outperforms multiple state-of-the-art unsupervised clustering methods including k-MACE, the clustering scheme that inspired kernel k-MACE.


2021 ◽  
Author(s):  
◽  
Abdul Wahid

<p>Clustering is an unsupervised machine learning technique, which involves discovering different clusters (groups) of similar objects in unlabeled data and is generally considered to be a NP hard problem. Clustering methods are widely used in a verity of disciplines for analyzing different types of data, and a small improvement in clustering method can cause a ripple effect in advancing research of multiple fields.  Clustering any type of data is challenging and there are many open research questions. The clustering problem is exacerbated in the case of text data because of the additional challenges such as issues in capturing semantics of a document, handling rich features of text data and dealing with the well known problem of the curse of dimensionality.  In this thesis, we investigate the limitations of existing text clustering methods and address these limitations by providing five new text clustering methods--Query Sense Clustering (QSC), Dirichlet Weighted K-means (DWKM), Multi-View Multi-Objective Evolutionary Algorithm (MMOEA), Multi-objective Document Clustering (MDC) and Multi-Objective Multi-View Ensemble Clustering (MOMVEC). These five new clustering methods showed that the use of rich features in text clustering methods could outperform the existing state-of-the-art text clustering methods.  The first new text clustering method QSC exploits user queries (one of the rich features in text data) to generate better quality clusters and cluster labels.  The second text clustering method DWKM uses probability based weighting scheme to formulate a semantically weighted distance measure to improve the clustering results.  The third text clustering method MMOEA is based on a multi-objective evolutionary algorithm. MMOEA exploits rich features to generate a diverse set of candidate clustering solutions, and forms a better clustering solution using a cluster-oriented approach.  The fourth and the fifth text clustering method MDC and MOMVEC address the limitations of MMOEA. MDC and MOMVEC differ in terms of the implementation of their multi-objective evolutionary approaches.  All five methods are compared with existing state-of-the-art methods. The results of the comparisons show that the newly developed text clustering methods out-perform existing methods by achieving up to 16\% improvement for some comparisons. In general, almost all newly developed clustering algorithms showed statistically significant improvements over other existing methods.  The key ideas of the thesis highlight that exploiting user queries improves Search Result Clustering(SRC); utilizing rich features in weighting schemes and distance measures improves soft subspace clustering; utilizing multiple views and a multi-objective cluster oriented method improves clustering ensemble methods; and better evolutionary operators and objective functions improve multi-objective evolutionary clustering ensemble methods.  The new text clustering methods introduced in this thesis can be widely applied in various domains that involve analysis of text data. The contributions of this thesis which include five new text clustering methods, will not only help researchers in the data mining field but also to help a wide range of researchers in other fields.</p>


2013 ◽  
Vol 457-458 ◽  
pp. 919-925
Author(s):  
Yu Hua Liu ◽  
Cui Xu ◽  
Ke Xu ◽  
Jian Zhi Jin

By analyzing the problem of k-means, we find the traditional k-means algorithm suffers from some shortcomings, such as requiring the user to give out the number of clusters k in advance, being sensitive to the initial cluster centers, being sensitive to the noise and isolated data, only being applied to the type found in globular clusters, and being easily trapped into a local solution et cetera. This improved algorithm uses the potential of data to find the center data and eliminate the noise data. It decomposes big or extended cluster into several small clusters, then merges adjacent small clusters into a big cluster using the information provided by the Safety Area. Experimental results demonstrate that the improved k-means algorithm can determine the number of clusters, distinguish irregular cluster to a certain extent, decrease the dependence on the initial cluster centers, eliminate the effects of the noise data and get a better clustering accuracy.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 522
Author(s):  
Minhui Hu ◽  
Kaiwei Zeng ◽  
Yaohua Wang ◽  
Yang Guo

Unsupervised domain adaptation is a challenging task in person re-identification (re-ID). Recently, cluster-based methods achieve good performance; clustering and training are two important phases in these methods. For clustering, one major issue of existing methods is that they do not fully exploit the information in outliers by either discarding outliers in clusters or simply merging outliers. For training, existing methods only use source features for pretraining and target features for fine-tuning and do not make full use of all valuable information in source datasets and target datasets. To solve these problems, we propose a Threshold-based Hierarchical clustering method with Contrastive loss (THC). There are two features of THC: (1) it regards outliers as single-sample clusters to participate in training. It well preserves the information in outliers without setting cluster number and combines advantages of existing clustering methods; (2) it uses contrastive loss to make full use of all valuable information, including source-class centroids, target-cluster centroids and single-sample clusters, thus achieving better performance. We conduct extensive experiments on Market-1501, DukeMTMC-reID and MSMT17. Results show our method achieves state of the art.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7501
Author(s):  
Cunli Mao ◽  
Haoyuan Liang ◽  
Zhengtao Yu ◽  
Yuxin Huang ◽  
Junjun Guo

Finding the news of same case from the large numbers of case-involved news is an important basis for public opinion analysis. Existing text clustering methods usually based on topic models which only use topic and case infomation as the global features of documents, so distinguishing between different cases with similar types remains a challenge. The contents of documents contain rich local features. Taking into account the internal features of news, the information of cases and the contributions provided by different topics, we propose a clustering method of case-involved news, which combines topic network and multi-head attention mechanism. Using case information and topic information to construct a topic network, then extracting the global features by graph convolution network, thus realizing the combination of case information and topic information. At the same time, the local features are extracted by multi-head attention mechanism. Finally, the fusion of global features and local features is realized by variational auto-encoder, and the learned latent representations are used for clustering. The experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised clustering methods.


Author(s):  
Peng Zhou ◽  
Liang Du ◽  
Xuejun Li

Consensus clustering provides a framework to ensemble multiple clustering results to obtain a consensus and robust result. Most existing consensus clustering methods usually apply all data to ensemble learning, whereas ignoring the side effects caused by some difficult or unreliable instances. To tackle this problem, we propose a novel self-paced consensus clustering method to gradually involve instances from more reliable to less reliable ones into the ensemble learning. We first construct an initial bipartite graph from the multiple base clustering results, where the nodes represent the instances and clusters and the edges indicate that an instance belongs to a cluster. Then, we learn a structured bipartite graph from the initial one by self-paced learning, i.e., we automatically decide the reliability of each edge and involves the edges into graph learning in order of their reliability. At last, we obtain the final consensus clustering result from the learned bipartite graph. The extensive experimental results demonstrate the effectiveness and superiority of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document