A Robust Information Clustering Algorithm

We focus on the scenario of robust information clustering (RIC) based on the minimax optimization of mutual information (MI). The minimization of MI leads to the standard mass-constrained deterministic annealing clustering, which is an empirical risk-minimization algorithm. The maximization of MI works out an upper bound of the empirical risk via the identification of outliers (noisy data points). Furthermore, we estimate the real risk VC-bound and determine an optimal cluster number of the RIC based on the structural risk-minimization principle. One of the main advantages of the minimax optimization of MI is that it is a nonparametric approach, which identifies the outliers through the robust density estimate and forms a simple data clustering algorithm based on the square error of the Euclidean distance.

Download Full-text

A further discussion of structural risk minimization principle on set-valued probability space

2010 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2010.5580973 ◽

2010 ◽

Author(s):

Ji-Qiang Chen ◽

Chao Wang ◽

Xin-Ai Zhang ◽

Ming-Hu Ha

Keyword(s):

Probability Space ◽

Risk Minimization ◽

Structural Risk Minimization ◽

Minimization Principle ◽

Structural Risk Minimization Principle ◽

Structural Risk

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text

Online Learning from Data Streams with Varying Feature Spaces

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013232 ◽

2019 ◽

Vol 33 ◽

pp. 3232-3239 ◽

Cited By ~ 1

Author(s):

Ege Beyazit ◽

Jeevithan Alagurajah ◽

Xindong Wu

Keyword(s):

Online Learning ◽

Data Streams ◽

Learning Algorithm ◽

Feature Space ◽

Model Complexity ◽

Risk Minimization ◽

Minimization Principle ◽

Empirical Risk ◽

Feature Spaces ◽

Online Learning Algorithm

We study the problem of online learning with varying feature spaces. The problem is challenging because, unlike traditional online learning problems, varying feature spaces can introduce new features or stop having some features without following a pattern. Other existing methods such as online streaming feature selection (Wu et al. 2013), online learning from trapezoidal data streams (Zhang et al. 2016), and learning with feature evolvable streams (Hou, Zhang, and Zhou 2017) are not capable to learn from arbitrarily varying feature spaces because they make assumptions about the feature space dynamics. In this paper, we propose a novel online learning algorithm OLVF to learn from data with arbitrarily varying feature spaces. The OLVF algorithm learns to classify the feature spaces and the instances from feature spaces simultaneously. To classify an instance, the algorithm dynamically projects the instance classifier and the training instance onto their shared feature subspace. The feature space classifier predicts the projection confidences for a given feature space. The instance classifier will be updated by following the empirical risk minimization principle and the strength of the constraints will be scaled by the projection confidences. Afterwards, a feature sparsity method is applied to reduce the model complexity. Experiments on 10 datasets with varying feature spaces have been conducted to demonstrate the performance of the proposed OLVF algorithm. Moreover, experiments with trapezoidal data streams on the same datasets have been conducted to show that OLVF performs better than the state-of-the-art learning algorithm (Zhang et al. 2016).

Download Full-text

Application of Genetic Arithmetic and Support Vector Machine in Prediction of Compression Index of Clay

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.438-439.1167 ◽

2013 ◽

Vol 438-439 ◽

pp. 1167-1170

Author(s):

Xu Chao Shi ◽

Ying Fei Gao

Keyword(s):

Support Vector Machine ◽

Influence Factors ◽

Support Vector ◽

Compression Index ◽

Risk Minimization ◽

Minimization Principle ◽

Structural Risk Minimization Principle ◽

Svm Model ◽

Genetic Arithmetic ◽

Practical Tool

The compression index is an important soil property that is essential to many geotechnical designs. As the determination of the compression index from consolidation tests is relatively time-consuming. Support Vector Machine (SVM) is a statistical learning theory based on a structural risk minimization principle that minimizes both error and weight terms. Considering the fact that parameters in SVM model are difficult to be decided, a genetic SVM was presented in which the parameters in SVM method are optimized by Genetic Algorithm (GA). Taking plasticity index, water content, void ration and density of soil as primary influence factors, the prediction model of compression index based on GA-SVM approach was obtained. The results of this study showed that the GA-SVM approach has the potential to be a practical tool for predicting compression index of soil.

Download Full-text

Improved Extreme Learning Machine and Its Application in Image Quality Assessment

Mathematical Problems in Engineering ◽

10.1155/2014/426152 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

Li Mao ◽

Lidong Zhang ◽

Xingyang Liu ◽

Chaofeng Li ◽

Hong Yang

Keyword(s):

Image Quality ◽

Quality Assessment ◽

Extreme Learning Machine ◽

Image Quality Assessment ◽

Risk Minimization ◽

Minimization Principle ◽

Empirical Risk ◽

Weighted Extreme Learning Machine ◽

Learning Machine ◽

Hidden Layer

Extreme learning machine (ELM) is a new class of single-hidden layer feedforward neural network (SLFN), which is simple in theory and fast in implementation. Zong et al. propose a weighted extreme learning machine for learning data with imbalanced class distribution, which maintains the advantages from original ELM. However, the current reported ELM and its improved version are only based on the empirical risk minimization principle, which may suffer from overfitting. To solve the overfitting troubles, in this paper, we incorporate the structural risk minimization principle into the (weighted) ELM, and propose a modified (weighted) extreme learning machine (M-ELM and M-WELM). Experimental results show that our proposed M-WELM outperforms the current reported extreme learning machine algorithm in image quality assessment.

Download Full-text

A novel bidirectional clustering algorithm based on local density

10.21203/rs.3.rs-141525/v1 ◽

2021 ◽

Author(s):

BAICHENG LV ◽

WENHUA WU ◽

ZHIQIANG HU

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

Abstract With the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text

Modeling river stage–discharge–sediment rating relation using support vector regression

Hydrology Research ◽

10.2166/nh.2011.101 ◽

2012 ◽

Vol 43 (6) ◽

pp. 851-861 ◽

Cited By ~ 16

Author(s):

Sharad K. Jain

Keyword(s):

Support Vector Regression ◽

Sediment Concentration ◽

Support Vector ◽

Learning Approaches ◽

Risk Minimization ◽

Minimization Principle ◽

River Stage ◽

Structural Risk Minimization Principle ◽

Structural Risk ◽

Regression Problems

A variety of data-driven approaches have been developed in the recent past to capture the properties of hydrological data for improved modeling. These include artificial neural networks (ANNs), fuzzy logic and evolutionary algorithms, amongst others. Of late, kernel-based machine learning approaches have become popular due to their inherent advantages over traditional modeling techniques. In this work, support vector machines (SVMs), a kernel-based learning approach, has been investigated for its suitability to model the relationship between the river stage, discharge, and sediment concentration. SVMs are an approximate implementation of the structural risk minimization principle that aims at minimizing a bound on the generalization error of a model. These have been found to be promising in many areas including hydrology. Application of SVMs to regression problems is known as support vector regression (SVR). This paper presents an application of SVR to model river discharge and sediment concentration rating relation. The results obtained using SVR were compared with those from ANNs and it was found that the SVR approach is better when compared with ANNs.

Download Full-text