K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data

The Affinity Propagation (AP) algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based onK-nearest neighbor intervals (KNNI) for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results.

Download Full-text

A SOFT-LINK SPECTRAL MODEL FOR LINK PREDICTION

International Journal of Semantic Computing ◽

10.1142/s1793351x09000847 ◽

2009 ◽

Vol 03 (04) ◽

pp. 399-419

Author(s):

ASLI CELIKYILMAZ

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Classification Performance ◽

Underlying Structure ◽

Similarity Function ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Learning Stage

Unsupervised spectral clustering methods can yield good performance when identifying crisp clusters with low complexity since the learning algorithm does not rely on finding the local minima of an objective function and rather uses spectral properties of the graph. Nonetheless, the performance of such approaches are usually affected by their uncertain parameters. Using the underlying structure of a general spectral clustering method, in this paper a new soft-link spectral clustering algorithm is introduced to identify clusters based on fuzzy k-nearest neighbor approach. We construct a soft weight matrix of a graph by identifying the upper and lower boundaries of learning parameters of the similarity function, specifically the fuzzifier parameter (fuzziness) of the Fuzzy k-Nearest Neighbor algorithm. The algorithm allows perturbations on the graph Laplace during the learning stage by the changes on such learning parameters. With the empirical analysis using an artificial and a real textual entailment dataset, we demonstrate that our initial hypothesis of implementing soft links for spectral clustering can improve the classification performance of final outcome.

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text

Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines

International Journal of Neural Systems ◽

10.1142/s0129065797000318 ◽

1997 ◽

Vol 08 (03) ◽

pp. 301-315 ◽

Cited By ~ 8

Author(s):

Marcel J. Nijman ◽

Hilbert J. Kappen

Keyword(s):

Symmetry Breaking ◽

Incomplete Data ◽

Missing Values ◽

Nearest Neighbor ◽

Boltzmann Machine ◽

K Nearest Neighbor ◽

Data Set ◽

Input Space ◽

Learning Rules ◽

Radial Basis

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.

Download Full-text

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

Intelligent Data Analysis ◽

10.3233/ida-205497 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1453-1471

Author(s):

Chunhua Tang ◽

Han Wang ◽

Zhiwen Wang ◽

Xiangkun Zeng ◽

Huaran Yan ◽

...

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Substantial Improvement ◽

Experimental Results ◽

High Time ◽

Parameter Setting ◽

K Nearest Neighbor ◽

Density Based Clustering

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

Download Full-text

A Deep Learning Based Method for the Non-Destructive Measuring of Rock Strength through Hammering Sound

Applied Sciences ◽

10.3390/app9173484 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3484

Author(s):

Shuai Han ◽

Heng Li ◽

Mingchao Li ◽

Timothy Rose

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Rock Strength ◽

Support Vector ◽

K Nearest Neighbor ◽

Strength Measurement ◽

Regression Algorithms ◽

Almost All ◽

The Relationship ◽

Non Destructive

Hammering rocks of different strengths can make different sounds. Geological engineers often use this method to approximate the strengths of rocks in geology surveys. This method is quick and convenient but subjective. Inspired by this problem, we present a new, non-destructive method for measuring the surface strengths of rocks based on deep neural network (DNN) and spectrogram analysis. All the hammering sounds are transformed into spectrograms firstly, and a clustering algorithm is presented to filter out the outliers of the spectrograms automatically. One of the most advanced image classification DNN, the Inception-ResNet-v2, is then re-trained with the spectrograms. The results show that the training accurate is up to 94.5%. Following this, three regression algorithms, including Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF) are adopted to fit the relationship between the outputs of the DNN and the strength values. The tests show that KNN has the highest fitting accuracy, and SVM has the strongest generalization ability. The strengths (represented by rebound values) of almost all the samples can be predicted within an error of [−5, 5]. Overall, the proposed method has great potential in supporting the implementation of efficient rock strength measurement methods in the field.

Download Full-text

MRI brain tumor detection using optimal possibilistic fuzzy C-means clustering algorithm and adaptive k-nearest neighbor classifier

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02444-7 ◽

2020 ◽

Author(s):

D. Maruthi Kumar ◽

D. Satyanarayana ◽

M. N. Giri Prasad

Keyword(s):

Brain Tumor ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Tumor Detection ◽

K Nearest Neighbor ◽

Nearest Neighbor Classifier ◽

Fuzzy C Means ◽

Mri Brain ◽

Fuzzy C Means Clustering ◽

Neighbor Classifier

Download Full-text

3D Point Cloud Simplification Based on k-Nearest Neighbor and Clustering

Advances in Multimedia ◽

10.1155/2020/8825205 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Abdelaaziz Mahdaoui ◽

El Hassan Sbai

Keyword(s):

Point Cloud ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

3D Point Cloud ◽

K Nearest Neighbor ◽

Test Dataset ◽

New Approach ◽

Entropy Estimation ◽

3D Objects ◽

Point Cloud Simplification

While the reconstruction of 3D objects is increasingly used today, the simplification of 3D point cloud, however, becomes a substantial phase in this process of reconstruction. This is due to the huge amounts of dense 3D point cloud produced by 3D scanning devices. In this paper, a new approach is proposed to simplify 3D point cloud based on k-nearest neighbor (k-NN) and clustering algorithm. Initially, 3D point cloud is divided into clusters using k-means algorithm. Then, an entropy estimation is performed for each cluster to remove the ones that have minimal entropy. In this paper, MATLAB is used to carry out the simulation, and the performance of our method is testified by test dataset. Numerous experiments demonstrate the effectiveness of the proposed simplification method of 3D point cloud.

Download Full-text

Combining kNN Imputation and Bootstrap Calibrated

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2010100104 ◽

2010 ◽

Vol 6 (4) ◽

pp. 61-73 ◽

Cited By ~ 4

Author(s):

Yongsong Qin ◽

Shichao Zhang ◽

Chengqi Zhang

Keyword(s):

Confidence Intervals ◽

Incomplete Data ◽

Asymptotic Theory ◽

Nearest Neighbor ◽

Normal Approximation ◽

Simple Procedure ◽

Important Research ◽

K Nearest Neighbor ◽

Research Topics ◽

Data Discovery

The k-nearest neighbor (kNN) imputation, as one of the most important research topics in incomplete data discovery, has been developed with great successes on industrial data. However, it is difficult to obtain a mathematical valid and simple procedure to construct confidence intervals for evaluating the imputed data. This paper studies a new estimation for missing (or incomplete) data that is a combination of the kNN imputation and bootstrap calibrated EL (Empirical Likelihood). The combination not only releases the burden of seeking a mathematical valid asymptotic theory for the kNN imputation, but also inherits the advantages of the EL method compared to the normal approximation method. Simulation results demonstrate that the bootstrap calibrated EL method performs quite well in estimating confidence intervals for the imputed data with kNN imputation method.

Download Full-text

Demand Prediction of Emergency Supplies under Fuzzy and Missing Partial Data

Discrete Dynamics in Nature and Society ◽

10.1155/2019/6823921 ◽

2019 ◽

Vol 2019 ◽

pp. 1-15

Author(s):

Ming Zhang ◽

Hanlin Wu ◽

Zhifeng Qiu ◽

Yifan Zhang ◽

Boquan Li

Keyword(s):

Prediction Model ◽

Prediction Accuracy ◽

Nearest Neighbor ◽

Uncertain Information ◽

K Nearest Neighbor ◽

Emergency Rescue ◽

Demand Prediction ◽

Partial Data ◽

Grey Relation ◽

Fuzzy Interval

An accurate demand prediction of emergency supplies according to disaster information and historical data is an important research subject in emergency rescue. This study aims at improving supplies demand prediction accuracy under partial data fuzziness and missing. The main contributions of this study are summarized as follows. (1) In view that it is difficult for the turning point of the whitenization weight function to determine fuzzy data, two computational formulas solving “core” of fuzzy interval grey numbers were proposed, and the obtained “core” replaced primary fuzzy information so as to reach the goal of transforming uncertain information into certain information. (2) For partial data missing, the improved grey k-nearest neighbor (GKNN) algorithm was put forward based on grey relation degree and K-nearest neighbor (KNN) algorithm. Weights were introduced in the filling and logic test conditions were added after filling so that filling results were of higher truthfulness and accuracy. (3) The preprocessed data are input into the improved algorithm based on the genetic algorithm and BP neural networks (GABP) to obtain the demand prediction model. Finally the calculation presents that the prediction accuracy and its stability are improved at the five-group comparative tests of calculated examples of actual disasters. The experiments indicated that the supplies demand prediction model under data fuzziness and missing proposed in this study was of higher prediction accuracy.

Download Full-text

Clustering algorithm based on mutual K-nearest neighbor relationships

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.10149 ◽

2012 ◽

Vol 5 (2) ◽

pp. 100-113 ◽

Cited By ~ 8

Author(s):

Zhen Hu ◽

Raj Bhatnagar

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Neighbor Relationships

Download Full-text