scholarly journals IKM-NCS: A Novel Clustering Scheme Based on Improved K-Means Algorithm

Aiming at the problems of distorted center selection and slow iteration convergence in traditional clustering analysis algorithm, a novel clustering scheme based on improved k-means algorithm is proposed. In this paper, based on the analysis of all user behavior sets contained in the initial sample, a weight calculation method for abnormal behaviors and an eigenvalue extraction method for abnormal behavior set are proposed and a set of abnormal behaviors is constructed for each user according to the behavior data generated by abnormal users. Then, on the basis of the traditional k-means clustering algorithm, an improved algorithm is proposed. By calculating the compactness of all data points and selecting the initial cluster center among the data points with high and low compactness, the clustering performance is enhanced. Finally, the eigenvalues of the abnormal behavior set are used as the input of the algorithm to output the clustering results of the abnormal behavior. Experimental results show that the clustering performance of this algorithm is better than the traditional clustering algorithm, and can effectively improve the clustering performance of abnormal behavior

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Ziqi Jia ◽  
Ling Song

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.


Author(s):  
SANGHAMITRA BANDYOPADHYAY ◽  
UJJWAL MAULIK ◽  
MALAY KUMAR PAKHIRA

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.


2015 ◽  
Vol 15 (03n04) ◽  
pp. 1540002
Author(s):  
YANJING HU ◽  
QINGQI PEI ◽  
LIAOJUN PANG

Protocol's abnormal behavior analysis is an important task in protocol reverse analysis. Traditional protocol reverse analysis focus on the protocol message format, but protocol behavior especially the abnormal behavior is rare studied. In this paper, protocol behavior is represented by the labeled behavior instruction sequences. Similar behavior instruction sequences mean the similar protocol behavior. Using our developed virtual analysis platform HiddenDisc, we can capture a variety of known or unknown protocols' behavior instruction sequences. All kinds of executed or unexecuted instruction sequences can automatic clustering by our designed instruction clustering algorithm. Thereby we can distinguish and mine the unknown protocols' potential abnormal behavior. The mined potential abnormal behavior instruction sequences are executed, monitored and analyzed on HiddenDisc to determine whether it is an abnormal behavior and what is the behavior's nature. Using the instruction clustering algorithm, we have analyzed 1297 protocol samples, mined 193 potential abnormal instruction sequences, and determined 187 malicious abnormal behaviors by regression testing. Experimental results show that our proposed instruction clustering algorithm has high efficiency and accuracy, can mine unknown protocols' abnormal behaviors effectively, and enhance the initiative defense capability of network security.


Author(s):  
Simon Tongbram ◽  
Benjamin A. Shimray ◽  
Loitongbam Surajkumar Singh

Image segmentation has widespread applications in medical science, for example, classification of different tissues, identification of tumors, estimation of tumor size, surgery planning, and atlas matching. Clustering is a widely implemented unsupervised technique used for image segmentation mainly because of its simplicity and fast computation. However, the quality and efficiency of clustering-based segmentation is highly depended on the initial value of the cluster centroid. In this paper, a new hybrid segmentation approach based on k-means clustering and modified subtractive clustering is proposed. K-means clustering is a very efficient and powerful algorithm but it requires initialization of cluster centroid. And, the consistency of the clustering outcomes of k-means algorithm depends on the initial selection of the cluster center. To overcome this drawback, a modified subtractive clustering algorithm based on distance relations between cluster centers and data points is proposed which finds a more accurate cluster centers compared to the conventional subtractive clustering. These cluster centroids obtained from the modified subtractive clustering are used in k-means algorithm for segmentation of the image. The proposed method is compared with other existing conventional segmentation methods by using several synthetic and real images and experimental finding validates the superiority of the proposed method.


Author(s):  
Qiu-Xia Hu ◽  
Jie Tian ◽  
Dong-Jian He

In order to improve the segmentation accuracy of plant lesion images, multi-channels segmentation algorithm of plant disease image was proposed based on linear discriminant analysis (LDA) method’s mapping and K-means’ clustering. Firstly, six color channels from RGB model and HSV model were obtained, and six channels of all pixels were laid out to six columns. Then one of these channels was regarded as label and the others were regarded as sample features. These data were grouped for linear discrimination analysis, and the mapping values of the other five channels were applied to the eigen vector space according to the first three big eigen values. Secondly, the mapping value was used as the input data for K-means and the points with minimum and maximum pixel values were used as the initial cluster center, which overcame the randomness for selecting the initial cluster center in K-means. And the segmented pixels were changed into background and foreground, so that the proposed segmentation method became the clustering of two classes for background and foreground. Finally, the experimental result showed that the segmentation effect of the proposed LDA mapping-based method is better than those of K-means, ExR and CIVE methods.


2016 ◽  
Vol 2016 ◽  
pp. 1-10
Author(s):  
Ning Li ◽  
Yunxia Gu ◽  
Zhongliang Deng

A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.


2010 ◽  
Vol 439-440 ◽  
pp. 605-610
Author(s):  
Xiao Yong Liu

In this paper, a new RBF neural network (RBFNN) algorithm, called ar-RBFNN, is presented. In traditional RBFNNs based on clustering algorithm, called oRBFNN in this paper, the width of the basis function-Gaussian function, or called radius, ignored the effect of numbers in different clusters, or density of data points. New algorithm considers radius is effect to performance of algorithms in problem of function approximation. Mean Square Error is used to evaluate performances of two algorithms, oRBFNN and ar-RBFNN algorithms. Several experiments in function approximation show ar-RBFNN is better than oRBFNN.


2010 ◽  
Vol 29-32 ◽  
pp. 802-808
Author(s):  
Min Min

On analyzing the common problems in fuzzy clustering algorithms, we put forward the combined fuzzy clustering one, which will automatically generate a reasonable clustering numbers and initial cluster center. This clustering algorithm has been tested by real evaluation data of teaching designs. The result proves that the combined fuzzy clustering based on F-statistic is more effective.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xiaolei Chen ◽  
Sikun Ge

Based on the parallel K-means algorithm, this article conducts in-depth research on the related issues of marketing node detection under the Internet, including designing a new Internet marketing node detector and a location summary network based on FCN (Full Convolutional Network) to input the preprocessing of the node and verify its performance under the data sets. At the same time, to solve the problem of insufficient data sets of Internet marketing nodes, the Internet data sets are artificially generated and used for detector training. First, the multiclass K-means algorithm is changed to two categories suitable for Internet marketing node detection: marketing nodes and background categories. Secondly, the weights in the K-means algorithm are mostly only applicable to target detection tasks. Therefore, when processing Internet marketing node detection tasks, the K-means algorithm is used to regress the training set and calculate 5 weights. During the simulation experiment, the weight calculation formula is used to calculate the weight of the feature term. The basic idea is that if a feature word appears more often in this document but less frequently in other nodes, the word will be assigned higher. At the same time, this article focuses on k. Some shortcomings of the mean clustering algorithm have been specifically improved. By standardizing the data participating in the clustering, the data participating in the clustering is transformed from an irregular distribution to a cluster-like distribution, thereby facilitating the clustering process. The density is introduced to determine the initial center of the cluster, and the purity metric is introduced to determine the appropriate density radius of the cluster center, to achieve the most effective reduction of the support vector machine training samples.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Hong Xia ◽  
Qingyi Dong ◽  
Hui Gao ◽  
Yanping Chen ◽  
ZhongMin Wang

It is difficult to accurately classify a service into specific service clusters for the multirelationships between services. To solve this problem, this paper proposes a service partition method based on particle swarm fuzzy clustering, which can effectively consider multirelationships between services by using a fuzzy clustering algorithm. Firstly, the algorithm for automatically determining the number of clusters is to determine the number of service clusters based on the density of the service core point. Secondly, the fuzzy c -means combined with particle swarm optimization algorithm to find the optimal cluster center of the service. Finally, the fuzzy clustering algorithm uses the improved Gram-cosine similarity to obtain the final results. Extensive experiments on real web service data show that our method is better than mainstream clustering algorithms in accuracy.


Sign in / Sign up

Export Citation Format

Share Document