K-Means Clustering Algorithm Based on Prim Improvement

2014 ◽  
Vol 644-650 ◽  
pp. 2063-2066
Author(s):  
He Wei Zhang ◽  
Lei Sun ◽  
Hong Zhang

K - means algorithm is the classical algorithm to solve the problem of clustering in the area of data mining, when the sample data meets certain conditions, the results of clustering is better. But the algorithm is sensitive to the initial clustering center and clustering results will change as the differences of initial clustering center its number. Aimed at this shortage, this paper proposes a new algorithm based on prim algorithm to select the initial clustering center, details the basic idea of the algorithm and improves the specific methods and implementation steps, finally uses a test for the contrastive analysis. Results show that the improved K - means clustering algorithm needs not to specify the initial clustering center in advance, and it is not sensitive to abnormal value, and at the same time the use of greedy strategy makes the clustering effect more optimal than usual algorithms.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Honglei Zhu ◽  
Yingying Zhao ◽  
Xueyun Wang ◽  
Yulong Xu

Medical data analysis is an important part of intelligent medicine, and clustering analysis is a commonly used method for data analysis of Traditional Chinese Medicine (TCM); however, the classical K-Means algorithm is greatly affected by the selection of initial clustering center, which is easy to fall into the local optimal solution. To avoid this problem, an improved differential evolution clustering algorithm is proposed in this paper. The proposed algorithm selects the initial clustering center randomly, optimizes and locates the clustering center in the process of evolution iteration, and improves the mutation mode of differential evolution to enhance the overall optimization ability, so that the clustering effect can reach the global optimization as far as possible. Three University of California, Irvine (UCI), data sets are selected to compare the clustering effect of the classical K-Means algorithm, the standard DE-K-Means algorithm, the K-Means++ algorithm, and the proposed algorithm. The experimental results show that, in terms of global optimization, the proposed algorithm is obviously superior to the other three algorithms, and in terms of convergence speed, the proposed algorithm is better than DE-K-Means algorithm. Finally, the proposed algorithm is applied to analyze the drug data of Traditional Chinese Medicine in the treatment of pulmonary diseases, and the analysis results are consistent with the theory of Traditional Chinese Medicine.


2014 ◽  
Vol 543-547 ◽  
pp. 2028-2031 ◽  
Author(s):  
Yang Zheng

This paper purposes a K-means clustering algorithm based on improved filtering process. Thealgorithm improves the filtering process,The two minimum sample points are reasonable initial clustering centers. It makes the probability summary of data in a cluster as large as possible, and the probability summary of data in different clusters as small as possible. Experimental results show that the proposed algorithm can select the proper initial clustering center, and it is more compact and robust than thetraditional K-means clustering algorithm.


2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


2018 ◽  
Vol 3 (1) ◽  
pp. 001
Author(s):  
Zulhendra Zulhendra ◽  
Gunadi Widi Nurcahyo ◽  
Julius Santony

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.


2021 ◽  
Vol 13 (14) ◽  
pp. 7585
Author(s):  
Yunmei Liu ◽  
Shuai Zhang ◽  
Min Chen ◽  
Yenchun Wu ◽  
Zhengxian Chen

Blockchain technology is the most cutting-edge technology in the field of financial technology, which has attracted extensive attention from governments, financial institutions and investors of various countries. Blockchain and finance, as an interdisciplinary, cross-technology and cross-field topic, has certain limitations in both theory and application. Based on the bibliometrics data of Web of Science, this paper conducts data mining on 759 papers related to blockchain technology in the financial field by means of co-word analysis, bi-clustering algorithm and strategic coordinate analysis, so as to explore hot topics in this field and predict the future development trend. The experimental results found ten research topics in the field of blockchain combined with finance, including blockchain crowdfunding, Fintech, encryption currency, consensus mechanism, the Internet of Things, digital financial, medical insurance, supply chain finance, intelligent contract and financial innovation. Among them, blockchain crowdfunding, Fintech, encryption currency and supply chain finance are the key research directions in this research field. Finally, this paper also analyzes the opportunities and risks of blockchain development in the financial field and puts forward targeted suggestions for the government and financial institutions.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


2021 ◽  
pp. 1-14
Author(s):  
Yujia Qu ◽  
Yuanjun Wang

BACKGROUND: The corpus callosum in the midsagittal plane plays a crucial role in the early diagnosis of diseases. When the anisotropy of the diffusion tensor in the midsagittal plane is calculated, the anisotropy of corpus callosum is close to that of the fornix, which leads to blurred boundary of the segmentation region. OBJECTIVE: To apply a fuzzy clustering algorithm combined with new spatial information to achieve accurate segmentation of the corpus callosum in the midsagittal plane in diffusion tensor images. METHODS: In this algorithm, a fixed region of interest is selected from the midsagittal plane, and the anisotropic filtering algorithm based on tensor is implemented by replacing the gradient direction of the structural tensor with an eigenvector, thus filtering the diffusion tensor of region of interest. Then, the iterative clustering center based on K-means clustering is used as the initial clustering center of tensor fuzzy clustering algorithm. Taking filtered diffusion tensor as input data and different metrics as similarity measures, the neighborhood diffusion tensor pixel calculation method of Log Euclidean framework is introduced in the membership function calculation, and tensor fuzzy clustering algorithm is proposed. In this study, MGH35 data from the Human Connectome Project (HCP) are tested and the variance, accuracy and specificity of the experimental results are discussed. RESULTS: Segmentation results of three groups of subjects in MGH35 data are reported. The average segmentation accuracy is 97.34%, and the average specificity is 98.43%. CONCLUSIONS: When segmenting the corpus callosum of diffusion tensor imaging, our method cannot only effective denoise images, but also achieve high accuracy and specificity.


Sign in / Sign up

Export Citation Format

Share Document