A Density Peak Clustering Algorithm Based on the K-Nearest Shannon Entropy and Tissue-Like P System

Cluster analysis plays a crucial component in consumer behavior segment. The density peak clustering algorithm (DPC) is a novel density-based clustering method. However, it performs poorly in high-dimension datasets and the local density for boundary points. In addition, its fault tolerance is affected by one-step allocation strategy. To overcome these disadvantages, an adaptive density peak clustering algorithm based on dimensional-free and reverse k-nearest neighbors (ERK-DPC) is proposed in this paper. First, we compute Euler cosine distance to obtain the similarity of sample points in high-dimension datasets. Then, the adaptive local density formula is used to measure the local density of each point. Finally, the reverse k-nearest neighbor idea is added on two-step allocation strategy, which assigns the remaining points accurately and effectively. The proposed clustering algorithm is experiments on several benchmark datasets and real-world datasets. By comparing the benchmarks, the results demonstrate that the ERK-DPC algorithm superior to some state-of- the-art methods.

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669.v1 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

P System–Based Clustering Methods Using NoSQL Databases

Computation ◽

10.3390/computation9100102 ◽

2021 ◽

Vol 9 (10) ◽

pp. 102

Author(s):

Péter Lehotay-Kéry ◽

Tamás Tarczali ◽

Attila Kiss

Keyword(s):

Management System ◽

Database Management ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Database Systems ◽

Database Management System ◽

P Systems ◽

Main Element ◽

P System ◽

Clustering Methods

Models of computation are fundamental notions in computer science; consequently, they have been the subject of countless research papers, with numerous novel models proposed even in recent years. Amongst a multitude of different approaches, many of these methods draw inspiration from the biological processes observed in nature. P systems, or membrane systems, make an analogy between the communication in computing and the flow of information that can be perceived in living organisms. These systems serve as a basis for various concepts, ranging from the fields of computational economics and robotics to the techniques of data clustering. In this paper, such utilization of these systems—membrane system–based clustering—is taken into focus. Considering the growing number of data stored worldwide, more and more data have to be handled by clustering algorithms too. To solve this issue, bringing these methods closer to the data, their main element provides several benefits. Database systems equip their users with, for instance, well-integrated security features and more direct control over the data itself. Our goal is if the type of the database management system is given, e.g., NoSQL, but the corporation or the research team can choose which specific database management system is used, then we give a perspective, how the algorithms written like this behave in such an environment, so that, based on this, a more substantiated decision can be made, meaning which database management system should be connected to the system. For this purpose, we discover the possibilities of a clustering algorithm based on P systems when used alongside NoSQL database systems, that are designed to manage big data. Variants over two competing databases, MongoDB and Redis, are evaluated and compared to identify the advantages and limitations of using such a solution in these systems.

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

Clustering with Missing Features: A Density-Based Approach

Symmetry ◽

10.3390/sym14010060 ◽

2022 ◽

Vol 14 (1) ◽

pp. 60

Author(s):

Kun Gao ◽

Hassan Ali Khan ◽

Wenwen Qu

Keyword(s):

Real World ◽

Incomplete Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Distance Matrix ◽

Density Peak ◽

Density Clustering ◽

Real World Datasets ◽

Density Peak Clustering ◽

Feature Values

Density clustering has been widely used in many research disciplines to determine the structure of real-world datasets. Existing density clustering algorithms only work well on complete datasets. In real-world datasets, however, there may be missing feature values due to technical limitations. Many imputation methods used for density clustering cause the aggregation phenomenon. To solve this problem, a two-stage novel density peak clustering approach with missing features is proposed: First, the density peak clustering algorithm is used for the data with complete features, while the labeled core points that can represent the whole data distribution are used to train the classifier. Second, we calculate a symmetrical FWPD distance matrix for incomplete data points, then the incomplete data are imputed by the symmetrical FWPD distance matrix and classified by the classifier. The experimental results show that the proposed approach performs well on both synthetic datasets and real datasets.

Download Full-text

Density Peak Clustering algorithm using knowledge learning-based fruit fly optimization

International Journal of Computers and Applications ◽

10.1080/1206212x.2018.1440340 ◽

2018 ◽

Vol 40 (3) ◽

pp. 1-10

Author(s):

Ruihong Zhou ◽

Qiaoming Liu ◽

Xuming Han ◽

Limin Wang

Keyword(s):

Clustering Algorithm ◽

Fruit Fly ◽

Density Peak ◽

Fruit Fly Optimization ◽

Density Peak Clustering ◽

Knowledge Learning

Download Full-text

A Fast Density Peak Clustering Algorithm Optimized by Uncertain Number Neighbors for Breast MR Image

Journal of Physics Conference Series ◽

10.1088/1742-6596/1229/1/012024 ◽

2019 ◽

Vol 1229 ◽

pp. 012024 ◽

Cited By ~ 1

Author(s):

Fan Hong ◽

Yang Jing ◽

Hou Cun-cun ◽

Zhang Ke-zhen ◽

Yao Ruo-xia

Keyword(s):

Clustering Algorithm ◽

Mr Image ◽

Density Peak ◽

Breast Mr ◽

Density Peak Clustering

Download Full-text

A Grid-Density Based Algorithm by Weighted Spiking Neural P Systems with Anti-Spikes and Astrocytes in Spatial Cluster Analysis

Processes ◽

10.3390/pr8091132 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1132

Author(s):

Deting Kong ◽

Yuan Wang ◽

Xinyan Wu ◽

Xiyu Liu ◽

Jianhua Qu ◽

...

Keyword(s):

Dimensional Space ◽

P Systems ◽

High Dimensional ◽

P System ◽

Inhibitory Influence ◽

Spiking Neural P Systems ◽

Clustering Approach ◽

Spatial Cluster Analysis ◽

Effectiveness And Efficiency ◽

Real World Datasets

In this paper, we propose a novel clustering approach based on P systems and grid- density strategy. We present grid-density based approach for clustering high dimensional data, which first projects the data patterns on a two-dimensional space to overcome the curse of dimensionality problem. Then, through meshing the plane with grid lines and deleting sparse grids, clusters are found out. In particular, we present weighted spiking neural P systems with anti-spikes and astrocyte (WSNPA2 in short) to implement grid-density based approach in parallel. Each neuron in weighted SN P system contains a spike, which can be expressed by a computable real number. Spikes and anti-spikes are inspired by neurons communicating through excitatory and inhibitory impulses. Astrocytes have excitatory and inhibitory influence on synapses. Experimental results on multiple real-world datasets demonstrate the effectiveness and efficiency of our approach.

Download Full-text

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014755 ◽

2019 ◽

Vol 33 ◽

pp. 4755-4762 ◽

Cited By ~ 3

Author(s):

Xiaoyu Qin ◽

Kai Ming Ting ◽

Ye Zhu ◽

Vincent CS Lee

Keyword(s):

Clustering Algorithm ◽

Distance Measure ◽

Nearest Neighbour ◽

Density Peak ◽

Density Based Clustering ◽

New Type ◽

Density Peak Clustering ◽

The Impact ◽

First Time ◽

Tree Method

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on densitybased clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.

Download Full-text

A privacy‐preserving density peak clustering algorithm in cloud computing

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5641 ◽

2020 ◽

Vol 32 (11) ◽

Cited By ~ 1

Author(s):

Liping Sun ◽

Shang Ci ◽

Xiaoqing Liu ◽

Xiaoyao Zheng ◽

Qingying Yu ◽

...

Keyword(s):

Cloud Computing ◽

Clustering Algorithm ◽

Privacy Preserving ◽

Density Peak ◽

Density Peak Clustering

Download Full-text

A Density Peak Clustering Algorithm Based on the K-Nearest Shannon Entropy and Tissue-Like P System

Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors

VDPC: Variational Density Peak Clustering Algorithm

P System–Based Clustering Methods Using NoSQL Databases

VDPC: Variational Density Peak Clustering Algorithm

Clustering with Missing Features: A Density-Based Approach

Density Peak Clustering algorithm using knowledge learning-based fruit fly optimization

A Fast Density Peak Clustering Algorithm Optimized by Uncertain Number Neighbors for Breast MR Image

A Grid-Density Based Algorithm by Weighted Spiking Neural P Systems with Anti-Spikes and Astrocytes in Spatial Cluster Analysis

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

A privacy‐preserving density peak clustering algorithm in cloud computing

Export Citation Format