scholarly journals A Density-Peak-Based Clustering Method for Multiple Densities Dataset

2021 ◽  
Vol 10 (9) ◽  
pp. 589
Author(s):  
Zhicheng Shi ◽  
Ding Ma ◽  
Xue Yan ◽  
Wei Zhu ◽  
Zhigang Zhao

Clustering methods in data mining are widely used to detect hotspots in many domains. They play an increasingly important role in the era of big data. As an advanced algorithm, the density peak clustering (DPC) algorithm is able to deal with arbitrary datasets, although it does not perform well when the dataset includes multiple densities. The parameter selection of cut-off distance dc is normally determined by users’ experience and could affect clustering result. In this study, a density-peak-based clustering method is proposed to detect clusters from datasets with multiple densities and shapes. Two improvements are made regarding the limitations of existing clustering methods. First, DPC finds it difficult to detect clusters in a dataset with multiple densities. Each cluster has a unique shape and the interior includes different densities. This method adopts a step by step merging approach to solve the problem. Second, high densities of points can automatically be selected without manual participation, which is more efficient than the existing methods, which require user-specified parameters. According to experimental results, the clustering method can be applied to various datasets and performs better than traditional methods and DPC.

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Author(s):  
Zhihe Wang ◽  
Yongbiao Li ◽  
Hui Du ◽  
Xiaofen Wei

Aiming at density peaks clustering needs to manually select cluster centers, this paper proposes a fast new clustering method with auto-select cluster centers. Firstly, our method groups the data and marks each group as core or boundary groups according to its density. Secondly, it determines clusters by iteratively merging two core groups whose distance is less than the threshold and selects the cluster centers at the densest position in each cluster. Finally, it assigns boundary groups to the cluster corresponding to the nearest cluster center. Our method eliminates the need for the manual selection of cluster centers and improves clustering efficiency with the experimental results.


2021 ◽  
Vol 11 (23) ◽  
pp. 11476
Author(s):  
Jianjie Sun ◽  
Xi Chen ◽  
Zhengwu Fu ◽  
Giuseppe Lacidogna

In this study, the clustering method of the concrete matrix rupture and rubber fracture damages as well as the prediction of the ultimate load of crumb rubber concrete using the acoustic emission (AE) technique were investigated. The loading environment of the specimens was a four-point bending load. Six clustering methods including k-means, fuzzy c-means (FCM), self-organizing mapping (SOM), Gaussian mixture model (GMM), hierarchical model, and density peak clustering method were analyzed; the results illustrated that the density peak clustering has the best performance. Next, the optimal clustering algorithm was used to cluster AE signals so as to study the evolution behavior of different damage modes, and the ultimate load of crumb rubber concrete was predicted by an artificial neural network. The results indicated that the combination of AE techniques and appropriate clustering methods such as the density peak clustering method and the artificial neural network could be used as a practical tool for structural health monitoring of crumb rubber concrete.


2018 ◽  
Vol 6 (2) ◽  
Author(s):  
Elly Muningsih - AMIK BSI Yogyakarta

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.


Author(s):  
Mohamed Aymen Ben HajKacem ◽  
Chiheb-Eddine Ben N′Cir ◽  
Nadia Essoussi

Big Data clustering has become an important challenge in data analysis since several applications require scalable clustering methods to organize such data into groups of similar objects. Given the computational cost of most of the existing clustering methods, we propose in this paper a new clustering method, referred to as STiMR [Formula: see text]-means, able to provide good tradeoff between scalability and clustering quality. The proposed method is based on the combination of three acceleration techniques: sampling, triangle inequality and MapReduce. Sampling is used to reduce the number of data points when building cluster prototypes, triangle inequality is used to reduce the number of comparisons when looking for nearest clusters and MapReduce is used to configure a parallel framework for running the proposed method. Experiments performed on simulated and real datasets have shown the effectiveness of the proposed method, with the existing ones, in terms of running time, scalability and internal validity measures.


2015 ◽  
Vol 27 (07) ◽  
pp. 75-81
Author(s):  
Pavel Viktorovich Matrenin ◽  
◽  
Viktor Gilyachevich Sekaev ◽  

2021 ◽  
Author(s):  
Yizhang Wang ◽  
Di Wang ◽  
You Zhou ◽  
Chai Quek ◽  
Xiaofeng Zhang

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>


2021 ◽  
Vol 15 (5) ◽  
pp. 114-120
Author(s):  
A. M. Lila ◽  
I. Yu. Torshin ◽  
A. N. Gromov ◽  
V. A. Semenov ◽  
O. A. Gromova

The pharmacoinformation approach to the assessment and modeling of drugs involves the use of modern methods of data mining. These methods include: 1) analysis of big data (selection of texts of scientific publications, search for new biomarkers); 2) computer analysis of texts (automatic classification of texts by content, identification of pseudoscientific texts); 3) analysis of metric maps (visualization and analysis of complex patterns, including clustering) and 4) chemoinformation analysis, including the assessment of the effect of drugs on the transcriptome, proteome and microbiome of a person. The article provides examples of the application of these methods of pharmacoinformatics to chondroprotectors containing standardized forms of chondroitin sulfate and glucosamine sulfate.


2018 ◽  
Vol 27 (2) ◽  
pp. 263-273 ◽  
Author(s):  
Sesham Anand ◽  
P. Padmanabham ◽  
A. Govardhan ◽  
Rajesh H. Kulkarni

AbstractData mining techniques support numerous applications of intelligent transportation systems (ITSs). This paper critically reviews various data mining techniques for achieving trip planning in ITSs. The literature review starts with the discussion on the contributions of descriptive and predictive mining techniques in ITSs, and later continues on the contributions of the clustering techniques. Being the largely used approach, the use of cluster analysis in ITSs is assessed. However, big data analysis is risky with clustering methods. Thus, evolutionary computational algorithms are used for data mining. Though unsupervised clustering models are widely used, drawbacks such as selection of optimal number of clustering points, defining termination criterion, and lack of objective function also occur. Eventually, various drawbacks of evolutionary computational algorithm are also addressed in this paper.


Sign in / Sign up

Export Citation Format

Share Document