Grid-Based and Outlier Detection-Based Data Clustering and Classification

Abstract Data clustering is an important method used to discover naturally occurring structures in datasets. One of the most popular approaches is the grid-based concept of clustering algorithms. This kind of method is characterized by a fast processing time and it can also discover clusters of arbitrary shapes in datasets. These properties allow these methods to be used in many different applications. Researchers have created many versions of the clustering method using the grid-based approach. However, the key issue is the right choice of the number of grid cells. This paper proposes a novel grid-based algorithm which uses a method for an automatic determining of the number of grid cells. This method is based on the kdist function which computes the distance between each element of a dataset and its kth nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.

Download Full-text

ONLINE PROBABILISTIC FUZZY CLUSTERING METHOD BASED ON EVOLUTIONARY OPTIMIZATION OF CAT SWARM

Radio Electronics Computer Science Control ◽

10.15588/1607-3274-2021-2-7 ◽

2021 ◽

pp. 65-70

Author(s):

Ye. V. Bodyanskiy ◽

A. Yu. Shafronenko ◽

I. N. Klymova

Keyword(s):

Big Data ◽

Fuzzy Clustering ◽

Data Clustering ◽

Clustering Algorithm ◽

Evolutionary Optimization ◽

Clustering Methods ◽

Classification Problems ◽

Probabilistic Data ◽

Fuzzy Clustering Method ◽

Clustering And Classification

Context. The problems of big data clustering today is a very relevant area of artificial intelligence. This task is often found in many applications related to data mining, deep learning, etc. To solve these problems, traditional approaches and methods require that the entire data sample be submitted in batch form. Objective. The aim of the work is to propose a method of fuzzy probabilistic data clustering using evolutionary optimization of cat swarm, that would be devoid of the drawbacks of traditional data clustering approaches. Method. The procedure of fuzzy probabilistic data clustering using evolutionary algorithms, for faster determination of sample extrema, cluster centroids and adaptive functions, allowing not to spend machine resources for storing intermediate calculations and do not require additional time to solve the problem of data clustering, regardless of the dimension and the method of presentation for processing. Results. The proposed data clustering algorithm based on evolutionary optimization is simple in numerical implementation, is devoid of the drawbacks inherent in traditional fuzzy clustering methods and can work with a large size of input information processed online in real time. Conclusions. The results of the experiment allow to recommend the developed method for solving the problems of automatic clustering and classification of big data, as quickly as possible to find the extrema of the sample, regardless of the method of submitting the data for processing. The proposed method of online probabilistic fuzzy data clustering based on evolutionary optimization of cat swarm is intended for use in hybrid computational intelligence systems, neuro-fuzzy systems, in training artificial neural networks, in clustering and classification problems.

Download Full-text

Significant locations in auxiliary data as seeds for typical use cases of point clustering

Proceedings of the ICA ◽

10.5194/ica-proc-1-63-2018 ◽

2018 ◽

Vol 1 ◽

pp. 1-6

Author(s):

Johannes Kröger

Keyword(s):

Population Density ◽

Data Clustering ◽

Use Cases ◽

Auxiliary Data ◽

Density Data ◽

Local Maxima ◽

Point Data ◽

Seed Points ◽

Apparent Distribution ◽

Grid Based

Random greedy clustering and grid-based clustering are highly susceptible by their initial parameters. When used for point data clustering in maps they often change the apparent distribution of the underlying data. We propose a process that uses precomputed weighted seed points for the initialization of clusters, for example from local maxima in population density data. Exemplary results from the clustering of a dataset of petrol stations are presented.

Download Full-text