Pattern Recognition in Numerical Data Sets and Color Images through the Typicality Based on the GKPFCM Clustering Algorithm

We take the concept of typicality from the field of cognitive psychology, and we apply the meaning to the interpretation of numerical data sets and color images through fuzzy clustering algorithms, particularly the GKPFCM, looking to get better information from the processed data. The Gustafson Kessel Possibilistic Fuzzy c-means (GKPFCM) is a hybrid algorithm that is based on a relative typicality (membership degree, Fuzzy c-means) and an absolute typicality (typicality value, Possibilistic c-means). Thus, using both typicalities makes it possible to learn and analyze data as well as to relate the results with the theory of prototypes. In order to demonstrate these results we use a synthetic data set and a digitized image of a glass, in a first example, and images from the Berkley database, in a second example. The results clearly demonstrate the advantages of the information obtained about numerical data sets, taking into account the different meaning of typicalities and the availability of both values with the clustering algorithm used. This approach allows the identification of small homogeneous regions, which are difficult to find.

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Data Fusion Using a Multi-Sensor Sparse-Based Clustering Algorithm

Remote Sensing ◽

10.3390/rs12234007 ◽

2020 ◽

Vol 12 (23) ◽

pp. 4007

Author(s):

Kasra Rafiezadeh Shahi ◽

Pedram Ghamisi ◽

Behnood Rasti ◽

Robert Jackisch ◽

Paul Scheunders ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Information ◽

Clustering Algorithms ◽

Hyperspectral Data ◽

Sensor Data ◽

Data Sets ◽

Data Types ◽

Data Set ◽

Multiple Data Sets ◽

Imaging Sensors

The increasing amount of information acquired by imaging sensors in Earth Sciences results in the availability of a multitude of complementary data (e.g., spectral, spatial, elevation) for monitoring of the Earth’s surface. Many studies were devoted to investigating the usage of multi-sensor data sets in the performance of supervised learning-based approaches at various tasks (i.e., classification and regression) while unsupervised learning-based approaches have received less attention. In this paper, we propose a new approach to fuse multiple data sets from imaging sensors using a multi-sensor sparse-based clustering algorithm (Multi-SSC). A technique for the extraction of spatial features (i.e., morphological profiles (MPs) and invariant attribute profiles (IAPs)) is applied to high spatial-resolution data to derive the spatial and contextual information. This information is then fused with spectrally rich data such as multi- or hyperspectral data. In order to fuse multi-sensor data sets a hierarchical sparse subspace clustering approach is employed. More specifically, a lasso-based binary algorithm is used to fuse the spectral and spatial information prior to automatic clustering. The proposed framework ensures that the generated clustering map is smooth and preserves the spatial structures of the scene. In order to evaluate the generalization capability of the proposed approach, we investigate its performance not only on diverse scenes but also on different sensors and data types. The first two data sets are geological data sets, which consist of hyperspectral and RGB data. The third data set is the well-known benchmark Trento data set, including hyperspectral and LiDAR data. Experimental results indicate that this novel multi-sensor clustering algorithm can provide an accurate clustering map compared to the state-of-the-art sparse subspace-based clustering algorithms.

Download Full-text

A Dynamic Genetic Algorithm for Clustering Problems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1884 ◽

2013 ◽

Vol 411-414 ◽

pp. 1884-1893

Author(s):

Yong Chun Cao ◽

Ya Bin Shao ◽

Shuang Liang Tian ◽

Zheng Qi Cai

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Search Space ◽

Adaptive Mutation ◽

Data Sets ◽

Data Set ◽

Local Optima ◽

Clustering Problems

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.

Download Full-text

SPSM: A NEW HYBRID DATA CLUSTERING ALGORITHM FOR NONLINEAR DATA ANALYSIS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001409007685 ◽

2009 ◽

Vol 23 (08) ◽

pp. 1701-1737 ◽

Cited By ~ 3

Author(s):

UREERAT WATTANACHON ◽

CHIDCHANOK LURSINSAP

Keyword(s):

Clustering Algorithm ◽

Color Image ◽

Clustering Algorithms ◽

Noisy Data ◽

Second Phase ◽

Data Sets ◽

Data Set ◽

Cluster Distance ◽

Data Points ◽

Hybrid Data

Existing clustering algorithms, such as single-link clustering, k-means, CURE, and CSM are designed to find clusters based on predefined parameters specified by users. These algorithms may be unsuccessful if the choice of parameters is inappropriate with respect to the data set being clustered. Most of these algorithms work very well for compact and hyper-spherical clusters. In this paper, a new hybrid clustering algorithm called Self-Partition and Self-Merging (SPSM) is proposed. The SPSM algorithm partitions the input data set into several subclusters in the first phase and, then, removes the noisy data in the second phase. In the third phase, the normal subclusters are continuously merged to form the larger clusters based on the inter-cluster distance and intra-cluster distance criteria. From the experimental results, the SPSM algorithm is very efficient to handle the noisy data set, and to cluster the data sets of arbitrary shapes of different density. Several examples for color image show the versatility of the proposed method and compare with results described in the literature for the same images. The computational complexity of the SPSM algorithm is O(N2), where N is the number of data points.

Download Full-text

GRAPH BASED CLUSTERING WITH CONSTRAINTS AND ACTIVE LEARNING

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/37/1/15773 ◽

2021 ◽

Vol 37 (1) ◽

pp. 71-89

Author(s):

Vu-Tuan Dang ◽

Viet-Vu Vu ◽

Hong-Quan Do ◽

Thi Kieu Oanh Le

Keyword(s):

Active Learning ◽

Clustering Algorithm ◽

Side Information ◽

Clustering Algorithms ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Supervised Clustering ◽

Class Labels ◽

Graph Based Clustering

During the past few years, semi-supervised clustering has emerged as a new interesting direction in machine learning research. In a semi-supervised clustering algorithm, the clustering results can be significantly improved by using side information, which is available or collected from users. There are two main kinds of side information that can be learned in semi-supervised clustering algorithms: the class labels - called seeds or the pairwise constraints. The first semi-supervised clustering was introduced in 2000, and since that, many algorithms have been presented in literature. However, it is not easy to use both types of side information in the same algorithm. To address the problem, this paper proposes a semi-supervised graph based clustering algorithm that tries to use seeds and constraints in the clustering process, called MCSSGC. Moreover, we introduces a simple but efficient active learning method to collect the constraints that can boost the performance of MCSSGC, named KMMFFQS. In order to verify effectiveness of the proposed algorithm, we conducted a series of experiments not only on real data sets from UCI, but also on a document data set applied in an Information Extraction of Vietnamese documents. These obtained results show that the proposed algorithm can significantly improve the clustering process compared to some recent algorithms.

Download Full-text

Rough ISODATA Algorithm

International Journal of Fuzzy System Applications ◽

10.4018/ijfsa.2013100101 ◽

2013 ◽

Vol 3 (4) ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

S. Sampath ◽

B. Ramya

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Vital Role ◽

Data Sets ◽

Clustering Method ◽

Data Set ◽

Number Of Clusters ◽

Real Life Data ◽

Nonparametric Statistical

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.

Download Full-text

AN EFFICIENTFUZZY C-MEANS CLUSTERING ALGORITHM FOR MULTI-VALUED DATA SETS

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i1.265 ◽

2021 ◽

Vol 9 (1) ◽

pp. 1250-1264

Author(s):

P Gopala Krishna, D Lalitha Bhaskari

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Sets ◽

Similar Data ◽

Job Descriptions ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Ship Function ◽

Partition Clustering ◽

Diverse Data

In data analysis, items were mostly described by a set of characteristics called features, in which each feature contains only single value for each object. Even so, in existence, some features may include more than one value, such as a person with different job descriptions, activities, phone numbers, skills and different mailing addresses. Such features may be called as multi-valued features, and are mostly classified as null features while analyzing the data using machine learning and data mining techniques. In this paper, it is proposed a proximity function to be described between two substances with multi-valued features that are put into effect for clustering.The suggested distance approach allows iterative measurements of the similarities around objects as well as their characteristics. For facilitating the most suitable multi-valued factors, we put forward a model targeting at determining each factor’s relative prominence for diverse data extracting problems. The proposed algorithm is a partition clustering strategy that uses fuzzy c- means clustering for evolutions, which is using the novel member ship function by utilizing the proposed similarity measure. The proposed clustering algorithm as fuzzy c- means based Clustering of Multivalued Attribute Data (FCM-MVA).Therefore this becomes feasible using any mechanisms for cluster analysis to group similar data. The findings demonstrate that our test not only improves the performance the traditional measure of similarity but also outperforms other clustering algorithms on the multi-valued clustering framework.

Download Full-text

Comparative study on textual data set using fuzzy clustering algorithms

Kybernetes ◽

10.1108/k-11-2015-0301 ◽

2016 ◽

Vol 45 (8) ◽

pp. 1232-1242 ◽

Cited By ~ 3

Author(s):

Rjiba Sadika ◽

Moez Soltani ◽

Saloua Benammou

Keyword(s):

Comparative Study ◽

Clustering Algorithms ◽

Fuzzy Model ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Fuzzy C Means ◽

Textual Data ◽

Tunisian Revolution ◽

Fuzzy C Means Algorithm

Purpose The purpose of this paper is to apply the Takagi-Sugeno (T-S) fuzzy model techniques in order to treat and classify textual data sets with and without noise. A comparative study is done in order to select the most accurate T-S algorithm in the textual data sets. Design/methodology/approach From a survey about what has been termed the “Tunisian Revolution,” the authors collect a textual data set from a questionnaire targeted at students. Five clustering algorithms are mainly applied: the Gath-Geva (G-G) algorithm, the modified G-G algorithm, the fuzzy c-means algorithm and the kernel fuzzy c-means algorithm. The authors examine the performances of the four clustering algorithms and select the most reliable one to cluster textual data. Findings The proposed methodology was to cluster textual data based on the T-S fuzzy model. On one hand, the results obtained using the T-S models are in the form of numerical relationships between selected keywords and the rest of words constituting a text. Consequently, it allows the authors to interpret these results not only qualitatively but also quantitatively. On the other hand, the proposed method is applied for clustering text taking into account the noise. Originality/value The originality comes from the fact that the authors validate some economical results based on textual data, even if they have not been written by experts in the linguistic fields. In addition, the results obtained in this study are easy and simple to interpret by the analysts.

Download Full-text

Determination of the Number of Clusters in a Data Set

Management Theories and Strategic Practices for Decision Making ◽

10.4018/978-1-4666-2473-3.ch004 ◽

2012 ◽

pp. 59-73

Author(s):

Derrick S. Boone

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Monte Carlo Study ◽

Stopping Rules ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

True Number ◽

Clustering Criterion

The accuracy of “stopping rules” for determining the number of clusters in a data set is examined as a function of the underlying clustering algorithm being used. Using a Monte Carlo study, various stopping rules, used in conjunction with six clustering algorithms, are compared to determine which rule/algorithm combinations best recover the true number of clusters. The rules and algorithms are tested using disparately sized, artificially generated data sets that contained multiple numbers and levels of clusters, variables, noise, outliers, and elongated and unequally sized clusters. The results indicate that stopping rule accuracy depends on the underlying clustering algorithm being used. The cubic clustering criterion (CCC), when used in conjunction with mixture models or Ward’s method, recovers the true number of clusters more accurately than other rules and algorithms. However, the CCC was more likely than other stopping rules to report more clusters than are actually present. Implications are discussed.

Download Full-text

Fuzzy C-Means Clustering Algorithm Based on Coefficient of Variation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.873 ◽

2014 ◽

Vol 998-999 ◽

pp. 873-877

Author(s):

Zhen Bo Wang ◽

Bao Zhi Qiu

Keyword(s):

Coefficient Of Variation ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

Cluster Center ◽

Data Set ◽

Fuzzy C Means ◽

Initial Cluster ◽

Fuzzy C Means Clustering ◽

The Impact

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.

Download Full-text