A Robust Information Clustering Algorithm

2005 ◽  
Vol 17 (12) ◽  
pp. 2672-2698 ◽  
Author(s):  
Qing Song

We focus on the scenario of robust information clustering (RIC) based on the minimax optimization of mutual information (MI). The minimization of MI leads to the standard mass-constrained deterministic annealing clustering, which is an empirical risk-minimization algorithm. The maximization of MI works out an upper bound of the empirical risk via the identification of outliers (noisy data points). Furthermore, we estimate the real risk VC-bound and determine an optimal cluster number of the RIC based on the structural risk-minimization principle. One of the main advantages of the minimax optimization of MI is that it is a nonparametric approach, which identifies the outliers through the robust density estimate and forms a simple data clustering algorithm based on the square error of the Euclidean distance.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


Author(s):  
Ege Beyazit ◽  
Jeevithan Alagurajah ◽  
Xindong Wu

We study the problem of online learning with varying feature spaces. The problem is challenging because, unlike traditional online learning problems, varying feature spaces can introduce new features or stop having some features without following a pattern. Other existing methods such as online streaming feature selection (Wu et al. 2013), online learning from trapezoidal data streams (Zhang et al. 2016), and learning with feature evolvable streams (Hou, Zhang, and Zhou 2017) are not capable to learn from arbitrarily varying feature spaces because they make assumptions about the feature space dynamics. In this paper, we propose a novel online learning algorithm OLVF to learn from data with arbitrarily varying feature spaces. The OLVF algorithm learns to classify the feature spaces and the instances from feature spaces simultaneously. To classify an instance, the algorithm dynamically projects the instance classifier and the training instance onto their shared feature subspace. The feature space classifier predicts the projection confidences for a given feature space. The instance classifier will be updated by following the empirical risk minimization principle and the strength of the constraints will be scaled by the projection confidences. Afterwards, a feature sparsity method is applied to reduce the model complexity. Experiments on 10 datasets with varying feature spaces have been conducted to demonstrate the performance of the proposed OLVF algorithm. Moreover, experiments with trapezoidal data streams on the same datasets have been conducted to show that OLVF performs better than the state-of-the-art learning algorithm (Zhang et al. 2016).


2013 ◽  
Vol 438-439 ◽  
pp. 1167-1170
Author(s):  
Xu Chao Shi ◽  
Ying Fei Gao

The compression index is an important soil property that is essential to many geotechnical designs. As the determination of the compression index from consolidation tests is relatively time-consuming. Support Vector Machine (SVM) is a statistical learning theory based on a structural risk minimization principle that minimizes both error and weight terms. Considering the fact that parameters in SVM model are difficult to be decided, a genetic SVM was presented in which the parameters in SVM method are optimized by Genetic Algorithm (GA). Taking plasticity index, water content, void ration and density of soil as primary influence factors, the prediction model of compression index based on GA-SVM approach was obtained. The results of this study showed that the GA-SVM approach has the potential to be a practical tool for predicting compression index of soil.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Li Mao ◽  
Lidong Zhang ◽  
Xingyang Liu ◽  
Chaofeng Li ◽  
Hong Yang

Extreme learning machine (ELM) is a new class of single-hidden layer feedforward neural network (SLFN), which is simple in theory and fast in implementation. Zong et al. propose a weighted extreme learning machine for learning data with imbalanced class distribution, which maintains the advantages from original ELM. However, the current reported ELM and its improved version are only based on the empirical risk minimization principle, which may suffer from overfitting. To solve the overfitting troubles, in this paper, we incorporate the structural risk minimization principle into the (weighted) ELM, and propose a modified (weighted) extreme learning machine (M-ELM and M-WELM). Experimental results show that our proposed M-WELM outperforms the current reported extreme learning machine algorithm in image quality assessment.


2021 ◽  
Author(s):  
BAICHENG LV ◽  
WENHUA WU ◽  
ZHIQIANG HU

Abstract With the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


2012 ◽  
Vol 43 (6) ◽  
pp. 851-861 ◽  
Author(s):  
Sharad K. Jain

A variety of data-driven approaches have been developed in the recent past to capture the properties of hydrological data for improved modeling. These include artificial neural networks (ANNs), fuzzy logic and evolutionary algorithms, amongst others. Of late, kernel-based machine learning approaches have become popular due to their inherent advantages over traditional modeling techniques. In this work, support vector machines (SVMs), a kernel-based learning approach, has been investigated for its suitability to model the relationship between the river stage, discharge, and sediment concentration. SVMs are an approximate implementation of the structural risk minimization principle that aims at minimizing a bound on the generalization error of a model. These have been found to be promising in many areas including hydrology. Application of SVMs to regression problems is known as support vector regression (SVR). This paper presents an application of SVR to model river discharge and sediment concentration rating relation. The results obtained using SVR were compared with those from ANNs and it was found that the SVR approach is better when compared with ANNs.


Sign in / Sign up

Export Citation Format

Share Document