A Density-Peak-Based Clustering Method for Multiple Densities Dataset

Clustering methods in data mining are widely used to detect hotspots in many domains. They play an increasingly important role in the era of big data. As an advanced algorithm, the density peak clustering (DPC) algorithm is able to deal with arbitrary datasets, although it does not perform well when the dataset includes multiple densities. The parameter selection of cut-off distance dc is normally determined by users’ experience and could affect clustering result. In this study, a density-peak-based clustering method is proposed to detect clusters from datasets with multiple densities and shapes. Two improvements are made regarding the limitations of existing clustering methods. First, DPC finds it difficult to detect clusters in a dataset with multiple densities. Each cluster has a unique shape and the interior includes different densities. This method adopts a step by step merging approach to solve the problem. Second, high densities of points can automatically be selected without manual participation, which is more efficient than the existing methods, which require user-specified parameters. According to experimental results, the clustering method can be applied to various datasets and performs better than traditional methods and DPC.

Download Full-text

A Fast Density Peak Clustering Method with Autoselect Cluster Centers

Mobile Information Systems ◽

10.1155/2022/4176101 ◽

2022 ◽

Vol 2022 ◽

pp. 1-13

Author(s):

Zhihe Wang ◽

Yongbiao Li ◽

Hui Du ◽

Xiaofen Wei

Keyword(s):

Experimental Results ◽

Cluster Center ◽

Clustering Method ◽

Density Peak ◽

Density Peaks ◽

Density Peaks Clustering ◽

Manual Selection ◽

Density Peak Clustering ◽

Selection Of

Aiming at density peaks clustering needs to manually select cluster centers, this paper proposes a fast new clustering method with auto-select cluster centers. Firstly, our method groups the data and marks each group as core or boundary groups according to its density. Secondly, it determines clusters by iteratively merging two core groups whose distance is less than the threshold and selects the cluster centers at the densest position in each cluster. Finally, it assigns boundary groups to the cluster corresponding to the nearest cluster center. Our method eliminates the need for the manual selection of cluster centers and improves clustering efficiency with the experimental results.

Download Full-text

Damage Pattern Recognition and Crack Propagation Prediction for Crumb Rubber Concrete Based on Acoustic Emission Techniques

Applied Sciences ◽

10.3390/app112311476 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11476

Author(s):

Jianjie Sun ◽

Xi Chen ◽

Zhengwu Fu ◽

Giuseppe Lacidogna

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Ultimate Load ◽

Crumb Rubber ◽

Clustering Methods ◽

Clustering Method ◽

Density Peak ◽

Crumb Rubber Concrete ◽

Density Peak Clustering ◽

Rubber Concrete

In this study, the clustering method of the concrete matrix rupture and rubber fracture damages as well as the prediction of the ultimate load of crumb rubber concrete using the acoustic emission (AE) technique were investigated. The loading environment of the specimens was a four-point bending load. Six clustering methods including k-means, fuzzy c-means (FCM), self-organizing mapping (SOM), Gaussian mixture model (GMM), hierarchical model, and density peak clustering method were analyzed; the results illustrated that the density peak clustering has the best performance. Next, the optimal clustering algorithm was used to cluster AE signals so as to study the evolution behavior of different damage modes, and the ultimate load of crumb rubber concrete was predicted by an artificial neural network. The results indicated that the combination of AE techniques and appropriate clustering methods such as the density peak clustering method and the artificial neural network could be used as a practical tool for structural health monitoring of crumb rubber concrete.

Download Full-text

KOMPARASI METODE CLUSTERING K-MEANS DAN K-MEDOIDS DENGAN MODEL FUZZY RFM UNTUK PENGELOMPOKAN PELANGGAN

Evolusi : Jurnal Sains dan Manajemen ◽

10.31294/evolusi.v6i2.4600 ◽

2018 ◽

Vol 6 (2) ◽

Author(s):

Elly Muningsih - AMIK BSI Yogyakarta

Keyword(s):

Data Clustering ◽

Small Data ◽

Clustering Methods ◽

Monetary Model ◽

Clustering Method ◽

Online Sales ◽

Rfm Model ◽

Potential Customers ◽

Cluster 2 ◽

Better Than

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.

Download Full-text

STiMR k-Means: An Efficient Clustering Method for Big Data

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419500137 ◽

2019 ◽

Vol 33 (08) ◽

pp. 1950013 ◽

Cited By ~ 3

Author(s):

Mohamed Aymen Ben HajKacem ◽

Chiheb-Eddine Ben N′Cir ◽

Nadia Essoussi

Keyword(s):

Big Data ◽

Triangle Inequality ◽

Computational Cost ◽

Internal Validity ◽

Clustering Methods ◽

Clustering Method ◽

Scalable Clustering ◽

Acceleration Techniques ◽

Clustering Quality ◽

Important Challenge

Big Data clustering has become an important challenge in data analysis since several applications require scalable clustering methods to organize such data into groups of similar objects. Given the computational cost of most of the existing clustering methods, we propose in this paper a new clustering method, referred to as STiMR [Formula: see text]-means, able to provide good tradeoff between scalability and clustering quality. The proposed method is based on the combination of three acceleration techniques: sampling, triangle inequality and MapReduce. Sampling is used to reduce the number of data points when building cluster prototypes, triangle inequality is used to reduce the number of comparisons when looking for nearest clusters and MapReduce is used to configure a parallel framework for running the proposed method. Experiments performed on simulated and real datasets have shown the effectiveness of the proposed method, with the existing ones, in terms of running time, scalability and internal validity measures.

Download Full-text

Focusing on a probability element: Parameter selection of message importance measure in big data

2017 IEEE International Conference on Communications (ICC) ◽

10.1109/icc.2017.7996803 ◽

2017 ◽

Cited By ~ 4

Author(s):

Rui She ◽

Shanyun Liu ◽

Yunquan Dong ◽

Pingyi Fan

Keyword(s):

Big Data ◽

Parameter Selection ◽

Importance Measure ◽

Probability Element ◽

Selection Of

Download Full-text

DATA MINING FOR PARAMETER SELECTION OF SWARM INTELLIGENCE ALGORITHMS

Theoretical & Applied Science ◽

10.15863/tas.2015.07.27.13 ◽

2015 ◽

Vol 27 (07) ◽

pp. 75-81

Author(s):

Pavel Viktorovich Matrenin ◽

◽

Viktor Gilyachevich Sekaev ◽

Keyword(s):

Data Mining ◽

Swarm Intelligence ◽

Parameter Selection ◽

Selection Of

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669.v1 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

Pharmacoinformation studies of chondroprotectors

Modern Rheumatology Journal ◽

10.14412/1996-7012-2021-5-114-120 ◽

2021 ◽

Vol 15 (5) ◽

pp. 114-120

Author(s):

A. M. Lila ◽

I. Yu. Torshin ◽

A. N. Gromov ◽

V. A. Semenov ◽

O. A. Gromova

Keyword(s):

Data Mining ◽

Big Data ◽

Chondroitin Sulfate ◽

Data Selection ◽

Scientific Publications ◽

Modern Methods ◽

Content Identification ◽

New Biomarkers ◽

Selection Of

The pharmacoinformation approach to the assessment and modeling of drugs involves the use of modern methods of data mining. These methods include: 1) analysis of big data (selection of texts of scientific publications, search for new biomarkers); 2) computer analysis of texts (automatic classification of texts by content, identification of pseudoscientific texts); 3) analysis of metric maps (visualization and analysis of complex patterns, including clustering) and 4) chemoinformation analysis, including the assessment of the effect of drugs on the transcriptome, proteome and microbiome of a person. The article provides examples of the application of these methods of pharmacoinformatics to chondroprotectors containing standardized forms of chondroitin sulfate and glucosamine sulfate.

Download Full-text

A methodology for automatic parameter-tuning and center selection in density-peak clustering methods

Soft Computing ◽

10.1007/s00500-020-05244-5 ◽

2020 ◽

Author(s):

José Carlos García-García ◽

Ricardo García-Ródenas

Keyword(s):

Parameter Tuning ◽

Clustering Methods ◽

Density Peak ◽

Density Peak Clustering ◽

Automatic Parameter Tuning

Download Full-text

An Extensive Review on Data Mining Methods and Clustering Models for Intelligent Transportation System

Journal of Intelligent Systems ◽

10.1515/jisys-2016-0159 ◽

2018 ◽

Vol 27 (2) ◽

pp. 263-273 ◽

Cited By ~ 4

Author(s):

Sesham Anand ◽

P. Padmanabham ◽

A. Govardhan ◽

Rajesh H. Kulkarni

Keyword(s):

Data Mining ◽

Intelligent Transportation Systems ◽

Intelligent Transportation System ◽

Intelligent Transportation ◽

Transportation Systems ◽

Optimal Number ◽

Clustering Methods ◽

Trip Planning ◽

Data Mining Techniques ◽

Selection Of

AbstractData mining techniques support numerous applications of intelligent transportation systems (ITSs). This paper critically reviews various data mining techniques for achieving trip planning in ITSs. The literature review starts with the discussion on the contributions of descriptive and predictive mining techniques in ITSs, and later continues on the contributions of the clustering techniques. Being the largely used approach, the use of cluster analysis in ITSs is assessed. However, big data analysis is risky with clustering methods. Thus, evolutionary computational algorithms are used for data mining. Though unsupervised clustering models are widely used, drawbacks such as selection of optimal number of clustering points, defining termination criterion, and lack of objective function also occur. Eventually, various drawbacks of evolutionary computational algorithm are also addressed in this paper.

Download Full-text