An improved OPTICS clustering algorithm for discovering clusters with uneven densities

2021 ◽  
Vol 25 (6) ◽  
pp. 1453-1471
Author(s):  
Chunhua Tang ◽  
Han Wang ◽  
Zhiwen Wang ◽  
Xiangkun Zeng ◽  
Huaran Yan ◽  
...  

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


2021 ◽  
Vol 10 (8) ◽  
pp. 548
Author(s):  
Jang-You Park ◽  
Dong-June Ryu ◽  
Kwang-Woo Nam ◽  
Insung Jang ◽  
Minseok Jang ◽  
...  

Density-based clustering algorithms have been the most commonly used algorithms for discovering regions and points of interest in cities using global positioning system (GPS) information in geo-tagged photos. However, users sometimes find more specific areas of interest using real objects captured in pictures. Recent advances in deep learning technology make it possible to recognize these objects in photos. However, since deep learning detection is a very time-consuming task, simply combining deep learning detection with density-based clustering is very costly. In this paper, we propose a novel algorithm supporting deep content and density-based clustering, called deep density-based spatial clustering of applications with noise (DeepDBSCAN). DeepDBSCAN incorporates object detection by deep learning into the density clustering algorithm using the nearest neighbor graph technique. Additionally, this supports a graph-based reduction algorithm that reduces the number of deep detections. We performed experiments with pictures shared by users on Flickr and compared the performance of multiple algorithms to demonstrate the excellence of the proposed algorithm.


2015 ◽  
Vol 09 (03) ◽  
pp. 307-331 ◽  
Author(s):  
Wei Zhang ◽  
Gongxuan Zhang ◽  
Yongli Wang ◽  
Zhaomeng Zhu ◽  
Tao Li

Nearest neighbor search is a key technique used in hierarchical clustering and its computing complexity decides the performance of the hierarchical clustering algorithm. The time complexity of standard agglomerative hierarchical clustering is O(n3), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain, SLINK and CLINK) is O(n2). This paper presents a new nearest neighbor search method called nearest neighbor boundary (NNB), which first divides a large dataset into independent subset and then finds nearest neighbor of each point in subset. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log 2n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering (NBC), and the proposed algorithm can be adapted to the parallel and distributed computing framework. The experimental results demonstrate that our algorithm is practical for large datasets.


2011 ◽  
Vol 291-294 ◽  
pp. 344-348
Author(s):  
Lin Lin ◽  
Shu Yan ◽  
Yi Nian

The hierarchical topology of wireless sensor networks can effectively reduce the consumption in communication. Clustering algorithm is the foundation to realize herarchical structure, so it has been extensive researched. On the basis of Leach algorithm, a distance density based clustering algorithm (DDBC) is proposed, considering synthetically the distribution density of around nodes and the remaining energy factors of the node to dynamically banlance energy usage of nodes when selecting cluster heads. We analyzed the performance of DDBC through compared with the existing other clustering algorithms in simulation experiment. Results show that the proposed method can generare stable quantity cluster heads and banlance the energy load effectively.


2015 ◽  
Vol 13 (2) ◽  
pp. 50-58
Author(s):  
R. Khadim ◽  
R. El Ayachi ◽  
Mohamed Fakir

This paper focuses on the recognition of 3D objects using 2D attributes. In order to increase the recognition rate, the present an hybridization of three approaches to calculate the attributes of color image, this hybridization based on the combination of Zernike moments, Gist descriptors and color descriptor (statistical moments). In the classification phase, three methods are adopted: Neural Network (NN), Support Vector Machine (SVM), and k-nearest neighbor (KNN). The database COIL-100 is used in the experimental results.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Cheng Lu ◽  
Shiji Song ◽  
Cheng Wu

The Affinity Propagation (AP) algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based onK-nearest neighbor intervals (KNNI) for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results.


2012 ◽  
Vol 9 (4) ◽  
pp. 1645-1661 ◽  
Author(s):  
Ray-I Chang ◽  
Shu-Yu Lin ◽  
Jan-Ming Ho ◽  
Chi-Wen Fann ◽  
Yu-Chun Wang

Image retrieval has been popular for several years. There are different system designs for content based image retrieval (CBIR) system. This paper propose a novel system architecture for CBIR system which combines techniques include content-based image and color analysis, as well as data mining techniques. To our best knowledge, this is the first time to propose segmentation and grid module, feature extraction module, K-means and k-nearest neighbor clustering algorithms and bring in the neighborhood module to build the CBIR system. Concept of neighborhood color analysis module which also recognizes the side of every grids of image is first contributed in this paper. The results show the CBIR systems performs well in the training and it also indicates there contains many interested issue to be optimized in the query stage of image retrieval.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2019 ◽  
Vol 9 (17) ◽  
pp. 3484
Author(s):  
Shuai Han ◽  
Heng Li ◽  
Mingchao Li ◽  
Timothy Rose

Hammering rocks of different strengths can make different sounds. Geological engineers often use this method to approximate the strengths of rocks in geology surveys. This method is quick and convenient but subjective. Inspired by this problem, we present a new, non-destructive method for measuring the surface strengths of rocks based on deep neural network (DNN) and spectrogram analysis. All the hammering sounds are transformed into spectrograms firstly, and a clustering algorithm is presented to filter out the outliers of the spectrograms automatically. One of the most advanced image classification DNN, the Inception-ResNet-v2, is then re-trained with the spectrograms. The results show that the training accurate is up to 94.5%. Following this, three regression algorithms, including Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF) are adopted to fit the relationship between the outputs of the DNN and the strength values. The tests show that KNN has the highest fitting accuracy, and SVM has the strongest generalization ability. The strengths (represented by rebound values) of almost all the samples can be predicted within an error of [−5, 5]. Overall, the proposed method has great potential in supporting the implementation of efficient rock strength measurement methods in the field.


Sign in / Sign up

Export Citation Format

Share Document