Adaptive K-Means Algorithm with Dynamically Changing Cluster Centers and K-Value

2012 ◽  
Vol 532-533 ◽  
pp. 1373-1377 ◽  
Author(s):  
Ai Ping Deng ◽  
Ben Xiao ◽  
Hui Yong Yuan

In allusion to the disadvantage of having to obtain the number of clusters in advance and the sensitivity to selecting initial clustering centers in the K-means algorithm, an improved K-means algorithm is proposed, that the cluster centers and the number of clusters are dynamically changing. The new algorithm determines the cluster centers by calculating the density of data points and shared nearest neighbor similarity, and controls the clustering categories by using the average shared nearest neighbor self-similarity.The experimental results of IRIS testing data set show that the algorithm can select the cluster cennters and can distinguish between different types of cluster efficiently.

2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


Author(s):  
Md. Zakir Hossain ◽  
Md.Nasim Akhtar ◽  
R.B. Ahmad ◽  
Mostafijur Rahman

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets.  The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>


Author(s):  
Peter Wagstaff ◽  
Pablo Minguez Gabina ◽  
Ricardo Mínguez ◽  
John C Roeske

Abstract A shallow neural network was trained to accurately calculate the microdosimetric parameters, <z1> and <z1 2> (the first and second moments of the single-event specific energy spectra, respectively) for use in alpha-particle microdosimetry calculations. The regression network of four inputs and two outputs was created in MATLAB and trained on a data set consisting of both previously published microdosimetric data and recent Monte Carlo simulations. The input data consisted of the alpha-particle energies (3.97–8.78 MeV), cell nuclei radii (2–10 µm), cell radii (2.5–20 µm), and eight different source-target configurations. These configurations included both single cells in suspension and cells in geometric clusters. The mean square error (MSE) was used to measure the performance of the network. The sizes of the hidden layers were chosen to minimize MSE without overfitting. The final neural network consisted of two hidden layers with 13 and 20 nodes, respectively, each with tangential sigmoid transfer functions, and was trained on 1932 data points. The overall training/validation resulted in a MSE = 3.71×10-7. A separate testing data set included input values that were not seen by the trained network. The final test on 892 separate data points resulted in a MSE = 2.80×10-7. The 95th percentile testing data errors were within ±1.4% for <z1> outputs and ±2.8% for <z1 2> outputs, respectively. Cell survival was also predicted using actual vs. neural network generated microdosimetric moments and showed overall agreement within ±3.5%. In summary, this trained neural network can accurately produce microdosimetric parameters used for the study of alpha-particle emitters. The network can be exported and shared for tests on independent data sets and new calculations.


2020 ◽  
Vol 61 (2) ◽  
pp. 116-125
Author(s):  
Yen Quoc Phan ◽  
Nga Thu Thi Nguyen ◽  

Surface modeling is done by many classic and modern algorithms such as Polynomial Interpolation, Delaunay Triangulation, Nearest Neighbor, Natural Neighbor, Kriging, Inverse Distance Weighting (IDW), Spline Functions, etc. The important issue is to experiment, evaluate and select algorithms suitable to the reality of the data and the study area. The paper used three algorithms IDW, Kriging and Natural Neighbor to model the terrain on two map sheets representing different types of terrain. From there, compare the results and evaluate the accuracy of the methods based on random test data from the data set which is extracted from the original map. In addition, checking the contour determined from the algorithm compared to the original contour were also carried out on the entire map sheet. Results show that: Natural Neighbor algorithm gives better results on both experimental areas, then IDW and Kriging algorithms, the root mean Square Error of 15.2922, 16.4754 and 17.9949 m respectively for average high terrain and 13.9728, 15.2466, 15.7613 meters with high mountainous terrain


2013 ◽  
Vol 443 ◽  
pp. 456-461
Author(s):  
Ru Dan Lin ◽  
Ling Jian Wang

In the invading testing, the testing of unknown is mainly accomplished by the abnormal testing. Traditional abnormal testing methods need to construct a normal behavior feature outline reference mode. When establish this mode, it is needed to have large amount of pure normal data set, and this data set usually is not easy to gain from the real network. Whats worse, the problem of too much error reports and leaking reports in the abnormal testing is pervasive. In order to overcome this shortage, this paper rises a abnormal testing method which is combine clustering analysis and HMM. This method doesnt need any training data set of manual marking; it can explore many different types of invading behaviors. The experimental results indicate that this method has better effect on the testing, which is of a higher testing rate and lower error report rate.


2012 ◽  
Vol 253-255 ◽  
pp. 1675-1681 ◽  
Author(s):  
Yuan Wen ◽  
Shu Yan Chen ◽  
Qin Yuan Xiong ◽  
Ru Bi Han ◽  
Shi Yu Chen

Prediction of incident duration is very important in Advanced Intelligent Traffic Incident Management and the accuracy of prediction can provide exact information for travellers. It is widely used in the area of ITS. In this paper, K-Nearest neighbor (KNN) is employed to predict the incident duration, which puts forward a new distance metric and weight determination method. This KNN model is created based on the incident data set collected by DVS-Center for Transport and Navigation, Ministry of Transport, Public Works and Management, the Netherlands. Moreover, a simulation based on Matlab is used for incident duration prediction and optimizing the best k value. Finally, an error analysis is made based on this simulation. As a result, this method (KNN) obtains high accuracy and has a better effect than Bayesian Decision Method-Based Tree Algorithm. So it can be effectively applied to intelligent traffic incident detection and clearance systems.


2011 ◽  
Vol 145 ◽  
pp. 189-193 ◽  
Author(s):  
Horng Lin Shieh

In this paper, a hybrid method combining rough set and shared nearest neighbor algorithms is proposed for data clustering with non-globular shapes. The roughk-means algorithm is based on the distances between data and cluster centers. It partitions a data set with globular shapes well, but when the data are non-globular shapes, the results obtained by a roughk-means algorithm are not very satisfactory. In order to resolve this problem, a combined rough set and shared nearest neighbor algorithm is proposed. The proposed algorithm first adopts a shared nearest neighbor algorithm to evaluate the similarity among data, then the lower and upper approximations of a rough set algorithm are used to partition the data set into clusters.


Author(s):  
Wan Maseri Binti Wan Mohd ◽  
A.H. Beg ◽  
Tutut Herawan ◽  
A. Noraziah ◽  
K. F. Rabbi

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.


2014 ◽  
Vol 644-650 ◽  
pp. 2009-2012 ◽  
Author(s):  
Hai Tao Zhang ◽  
Bin Jun Wang

In order to solve the low efficiency problem of KNN or K-Means like algorithms in classification, a novel extension distance of interval is proposed to measure the similarity between testing data and the class domain. The method constructs representatives for data points in shorter time than traditional methods which replace original dataset to serve as the basis of classification. Virtually, the construction of the model containing representatives makes classification faster. Experimental results from two benchmark data sets, verify the effectiveness and applicability of the proposed work. The model based method using extension distance can effectively build data models to represent whole training data, and thus a high cost of classifying new instances problem is solved.


2011 ◽  
Vol 1 (3) ◽  
pp. 1-14 ◽  
Author(s):  
Wan Maseri Binti Wan Mohd ◽  
A.H. Beg ◽  
Tutut Herawan ◽  
A. Noraziah ◽  
K. F. Rabbi

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.


Sign in / Sign up

Export Citation Format

Share Document