Improved Bidirectional CABOSFV Based on Multi-Adjustment Clustering and Simulated Annealing

Abstract Although Clustering Algorithm Based on Sparse Feature Vector (CABOSFV) and its related algorithms are efficient for high dimensional sparse data clustering, there exist several imperfections. Such imperfections as subjective parameter designation and order sensibility of clustering process would eventually aggravate the time complexity and quality of the algorithm. This paper proposes a parameter adjustment method of Bidirectional CABOSFV for optimization purpose. By optimizing Parameter Vector (PV) and Parameter Selection Vector (PSV) with the objective function of clustering validity, an improved Bidirectional CABOSFV algorithm using simulated annealing is proposed, which circumvents the requirement of initial parameter determination. The experiments on UCI data sets show that the proposed algorithm, which can perform multi-adjustment clustering, has a higher accurateness than single adjustment clustering, along with a decreased time complexity through iterations.

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

On Fuzzy Non-Metric Model for Data with Tolerance and its Application to Incomplete Data Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0571 ◽

2016 ◽

Vol 20 (4) ◽

pp. 571-579 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Tomoyuki Suzuki ◽

Naohiko Kinoshita ◽

Yukihiro Hamasuna ◽

...

Keyword(s):

Data Clustering ◽

Incomplete Data ◽

Clustering Algorithm ◽

Uncertain Data ◽

Data Sets ◽

Membership Degree ◽

Clustering Methods ◽

Clustering Method ◽

Numerical Examples ◽

Metric Model

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.

Download Full-text

CLUSTERING USING SIMULATED ANNEALING WITH PROBABILISTIC REDISTRIBUTION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001401000927 ◽

2001 ◽

Vol 15 (02) ◽

pp. 269-285 ◽

Cited By ~ 54

Author(s):

SANGHAMITRA BANDYOPADHYAY ◽

UJJWAL MAULIK ◽

MALAY KUMAR PAKHIRA

Keyword(s):

Simulated Annealing ◽

Clustering Algorithm ◽

Minimum Energy ◽

Real Life ◽

Feature Space ◽

Cluster Center ◽

Data Sets ◽

Partitional Clustering ◽

Real Life Data ◽

Data Points

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.

Download Full-text

DBSCANI: Noise-Resistant Method for Missing Value Imputation

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0172 ◽

2016 ◽

Vol 25 (3) ◽

pp. 431-440 ◽

Cited By ~ 1

Author(s):

Archana Purwar ◽

Sandeep Kumar Singh

Keyword(s):

Spatial Data ◽

Missing Values ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Data Sets ◽

Quality Of Data ◽

Data Set ◽

Dbscan Clustering ◽

Density Based Clustering

AbstractThe quality of data is an important task in the data mining. The validity of mining algorithms is reduced if data is not of good quality. The quality of data can be assessed in terms of missing values (MV) as well as noise present in the data set. Various imputation techniques have been studied in MV study, but little attention has been given on noise in earlier work. Moreover, to the best of knowledge, no one has used density-based spatial clustering of applications with noise (DBSCAN) clustering for MV imputation. This paper proposes a novel technique density-based imputation (DBSCANI) built on density-based clustering to deal with incomplete values in the presence of noise. Density-based clustering algorithm proposed by Kriegal groups the objects according to their density in spatial data bases. The high-density regions are known as clusters, and the low-density regions refer to the noise objects in the data set. A lot of experiments have been performed on the Iris data set from life science domain and Jain’s (2D) data set from shape data sets. The performance of the proposed method is evaluated using root mean square error (RMSE) as well as it is compared with existing K-means imputation (KMI). Results show that our method is more noise resistant than KMI on data sets used under study.

Download Full-text

Hebbian Self-Organizing Integrate-and-Fire Networks for Data Clustering

Neural Computation ◽

10.1162/neco.2009.12-08-926 ◽

2010 ◽

Vol 22 (1) ◽

pp. 273-288 ◽

Cited By ~ 16

Author(s):

Florian Landis ◽

Thomas Ott ◽

Ruedi Stoop

Keyword(s):

Data Clustering ◽

Time Complexity ◽

Arbitrary Shape ◽

Clustering Algorithm ◽

Hebbian Learning ◽

Spiking Neurons ◽

Integrate And Fire ◽

Background Data ◽

Homogeneous Regions ◽

Noisy Background

We propose a Hebbian learning-based data clustering algorithm using spiking neurons. The algorithm is capable of distinguishing between clusters and noisy background data and finds an arbitrary number of clusters of arbitrary shape. These properties render the approach particularly useful for visual scene segmentation into arbitrarily shaped homogeneous regions. We present several application examples, and in order to highlight the advantages and the weaknesses of our method, we systematically compare the results with those from standard methods such as the k-means and Ward's linkage clustering. The analysis demonstrates that not only the clustering ability of the proposed algorithm is more powerful than those of the two concurrent methods, the time complexity of the method is also more modest than that of its generally used strongest competitor.

Download Full-text

Algorithm to forming a rule base for a fuzzy classifier designed on the basis of the K-means clustering algorithm and the whale optimization algorithm

10.21293/1818-0442-2021-24-1-42-47 ◽

2021 ◽

Vol 24 (1) ◽

pp. 42-47

Author(s):

N. P. Koryshev ◽

◽

I. A. Hodashinsky ◽

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Performance Testing ◽

Real Data ◽

Rule Base ◽

Data Sets ◽

Fuzzy Classifier ◽

Whale Optimization ◽

Clustering Quality ◽

Using Data

The article presents a description of the algorithm for generating fuzzy rules for a fuzzy classifier using data clustering, metaheuristic, and the clustering quality index, as well as the results of performance testing on real data sets.

Download Full-text

Density-based clustering with constraints

Computer Science and Information Systems ◽

10.2298/csis180601007l ◽

2019 ◽

Vol 16 (2) ◽

pp. 469-489 ◽

Cited By ~ 1

Author(s):

Piotr Lasek ◽

Jarek Gryz

Keyword(s):

Data Clustering ◽

Clustering Algorithms ◽

Background Knowledge ◽

Data Sets ◽

Benchmark Data ◽

Density Based Clustering

In this paper we present our ic-NBC and ic-DBSCAN algorithms for data clustering with constraints. The algorithms are based on density-based clustering algorithms NBC and DBSCAN but allow users to incorporate background knowledge into the process of clustering by means of instance constraints. The knowledge about anticipated groups can be applied by specifying the so-called must-link and cannot-link relationships between objects or points. These relationships are then incorporated into the clustering process. In the proposed algorithms this is achieved by properly merging resulting clusters and introducing a new notion of deferred points which are temporarily excluded from clustering and assigned to clusters based on their involvement in cannot-link relationships. To examine the algorithms, we have carried out a number of experiments. We used benchmark data sets and tested the efficiency and quality of the results. We have also measured the efficiency of the algorithms against their original versions. The experiments prove that the introduction of instance constraints improves the quality of both algorithms. The efficiency is only insignificantly reduced and is due to extra computation related to the introduced constraints.

Download Full-text

An Ordinal Data Clustering Algorithm with Automated Distance Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6168 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6869-6876

Author(s):

Yiqun Zhang ◽

Yiu-ming Cheung

Keyword(s):

Machine Learning ◽

Data Mining ◽

Categorical Data ◽

Data Clustering ◽

Time Complexity ◽

Ordinal Data ◽

Clustering Algorithm ◽

Major Type ◽

Common Task ◽

Consecutive Integers

Clustering ordinal data is a common task in data mining and machine learning fields. As a major type of categorical data, ordinal data is composed of attributes with naturally ordered possible values (also called categories interchangeably in this paper). However, due to the lack of dedicated distance metric, ordinal categories are usually treated as nominal ones, or coded as consecutive integers and treated as numerical ones. Both these two common ways will roughly define the distances between ordinal categories because the former way ignores the order relationship and the latter way simply assigns identical distances to different pairs of adjacent categories that may have intrinsically unequal distances. As a result, they may produce unsatisfactory ordinal data clustering results. This paper, therefore, proposes a novel ordinal data clustering algorithm, which iteratively learns: 1) The partition of ordinal dataset, and 2) the inter-category distances. To the best of our knowledge, this is the first attempt to dynamically adjust inter-category distances during the clustering process to search for a better partition of ordinal data. The proposed algorithm features superior clustering accuracy, low time complexity, fast convergence, and is parameter-free. Extensive experiments show its efficacy.

Download Full-text

A FAST k-MEANS IMPLEMENTATION USING CORESETS

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195908002787 ◽

2008 ◽

Vol 18 (06) ◽

pp. 605-625 ◽

Cited By ~ 17

Author(s):

GEREON FRAHLING ◽

CHRISTIAN SOHLER

Keyword(s):

Clustering Algorithm ◽

Data Sets ◽

Starting Solution ◽

Silhouette Coefficient ◽

3D Data ◽

Speed Up ◽

Previous Solution ◽

Point Set ◽

Set Of Points

In this paper we develop an efficient implementation for a k-means clustering algorithm. The algorithm is based on a combination of Lloyd's algorithm with random swapping of centers to avoid local minima. This approach was proposed by Mount 30. The novel feature of our algorithms is the use of coresets to speed up the algorithm. A coreset is a small weighted set of points that approximates the original point set with respect to the considered problem. We use a coreset construction described in 12. Our algorithm first computes a solution on a very small coreset. Then in each iteration the previous solution is used as a starting solution on a refined, i.e. larger, coreset. To evaluate the performance of our algorithm we compare it with algorithm KMHybrid 30 on typical 3D data sets for an image compression application and on artificially created instances. Our data sets consist of 300,000 to 4.9 million points. Our algorithm outperforms KMHybrid on most of these input instances. Additionally, the quality of the solutions computed by our algorithm deviates significantly less than that of KMHybrid. We conclude that the use of coresets has two effects. First, it can speed up algorithms significantly. Secondly, in variants of Lloyd's algorithm, it reduces the dependency on the starting solution and thus makes the algorithm more stable. Finally, we propose the use of coresets as a heuristic to approximate the average silhouette coefficient of clusterings. The average silhouette coefficient is a measure for the quality of a clustering that is independent of the number of clusters k. Hence, it can be used to compare the quality of clusterings for different sizes of k. To show the applicability of our approach we computed clusterings and approximate average silhouette coefficient for k = 1,…, 100 for our input instances and discuss the performance of our algorithm in detail.

Download Full-text

Improving the Efficiency of Information Flow Routing in Wireless Self-Organizing Networks Based on Natural Computing

Energies ◽

10.3390/en14082255 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2255

Author(s):

Krzysztof Przystupa ◽

Julia Pyrih ◽

Mykola Beshley ◽

Mykhailo Klymash ◽

Andriy Branytskyy ◽

...

Keyword(s):

Simulated Annealing ◽

Clustering Algorithm ◽

Random Search ◽

Signal Propagation ◽

Propagation Model ◽

Route Search ◽

Annealing Algorithms ◽

Self Organizing Networks ◽

Self Organizing

With the constant growth of requirements to the quality of infocommunication services, special attention is paid to the management of information transfer in wireless self-organizing networks. The clustering algorithm based on the Motley signal propagation model has been improved, resulting in cluster formation based on the criterion of shortest distance and maximum signal power value. It is shown that the use of the improved clustering algorithm compared to its classical version is more efficient for the route search process. Ant and simulated annealing algorithms are presented to perform route search in a wireless sensor network based on the value of the quality of service parameter. A comprehensive routing method based on finding the global extremum of an ordered random search with node addition/removal is proposed by using the presented ant and simulated annealing algorithms. It is shown that the integration of the proposed clustering and routing solutions can reduce the route search duration up to two times.

Download Full-text