Incremental kernel fuzzy c-means with optimizing cluster center initialization and delivery

Purpose The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster. Design/methodology/approach Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm. Findings Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm. Originality/value This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.

Download Full-text

Fuzzy C-Means Clustering Algorithm Based on Coefficient of Variation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.873 ◽

2014 ◽

Vol 998-999 ◽

pp. 873-877

Author(s):

Zhen Bo Wang ◽

Bao Zhi Qiu

Keyword(s):

Coefficient Of Variation ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

Cluster Center ◽

Data Set ◽

Fuzzy C Means ◽

Initial Cluster ◽

Fuzzy C Means Clustering ◽

The Impact

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Unleashing analytics to reduce electricity consumption using incremental clustering algorithm

International Journal of Energy Sector Management ◽

10.1108/ijesm-11-2019-0016 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Archana Yashodip Chaudhari ◽

Preeti Mulay

Keyword(s):

Real Time ◽

Clustering Algorithm ◽

Electricity Consumption ◽

Incremental Clustering ◽

Science Data ◽

Load Curve ◽

Data Set ◽

Content Type ◽

Validity Indices ◽

Reduce Electricity Consumption

Purpose To reduce the electricity consumption in our homes, a first step is to make the user aware of it. Reading a meter once in a month is not enough, instead, it requires real-time meter reading. Smart electricity meter (SEM) is capable of providing a quick and exact meter reading in real-time at regular time intervals. SEM generates a considerable amount of household electricity consumption data in an incremental manner. However, such data has embedded load patterns and hidden information to extract and learn consumer behavior. The extracted load patterns from data clustering should be updated because consumer behaviors may be changed over time. The purpose of this study is to update the new clustering results based on the old data rather than to re-cluster all of the data from scratch. Design/methodology/approach This paper proposes an incremental clustering with nearness factor (ICNF) algorithm to update load patterns without overall daily load curve clustering. Findings Extensive experiments are implemented on real-world SEM data of Irish Social Science Data Archive (Ireland) data set. The results are evaluated by both accuracy measures and clustering validity indices, which indicate that proposed method is useful for using the enormous amount of smart meter data to understand customers’ electricity consumption behaviors. Originality/value ICNF can provide an efficient response for electricity consumption patterns analysis to end consumers via SEMs.

Download Full-text

Extended Single-Iteration Fuzzy C-Means, and Gustafson-Kessel Algorithms for Medium-Sized (106) Multisource Weber Problem

International Journal of Operations Research and Information Systems ◽

10.4018/ijoris.2019070101 ◽

2019 ◽

Vol 10 (3) ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Tarik Kucukdeniz ◽

Sakir Esnaf ◽

Engin Bayturk

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Medium Size ◽

Weber Problem ◽

Solution Approach ◽

Fuzzy C Means ◽

Novel Approach ◽

New Facilities

An uncapacitated multisource Weber problem involves finding facility locations for known customers. When this problem is restated as finding locations for additional new facilities, while keeping the current facilities, a new solution approach is needed. In this study, two new and cooperative fuzzy clustering algorithms are developed to solve a variant of the uncapacitated version of a multisource Weber problem (MWP). The first algorithm proposed is the extensive version of the single iteration fuzzy c-means (SIFCM) algorithm. The SIFCM algorithm assigns customers to existing facilities. The new extended SIFCM (ESIFCM), which is first proposed in this study, allocates discrete locations (coordinates) with the SIFCM and locates and allocates continuous locations (coordinates) with the original FCM simultaneously. If the SIFCM and the FCM, show differences between the successive cluster center values are still decreasing, share customer points among facilities. It is simply explained as single-iteration fuzzy c-means with fuzzy c-means. The second algorithm, also proposed here, runs like the ESIFCM. Instead of the FCM, a Gustafson-Kessel (GK) fuzzy clustering algorithm is used under the same framework. This algorithm is based on single-iteration (SIGK) and the GK algorithms. Numerical results are reported using two MWP problems in a class of a medium-size-data (106 bytes). Using clustering algorithms to locate and allocate the new facilities while keeping current facilities is a novel approach. When applied to the big problems, the speed of the proposed algorithms enable to find a solution while mathematical programming solution is not doable due to the great computational costs.

Download Full-text

Study of Combined Fuzzy Clustering Algorithm Based on F-Statistics Hierarchy Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.29-32.802 ◽

2010 ◽

Vol 29-32 ◽

pp. 802-808

Author(s):

Min Min

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Evaluation Data ◽

Fuzzy Clustering Algorithm ◽

Initial Cluster ◽

The Common ◽

Common Problems ◽

F Statistics

On analyzing the common problems in fuzzy clustering algorithms, we put forward the combined fuzzy clustering one, which will automatically generate a reasonable clustering numbers and initial cluster center. This clustering algorithm has been tested by real evaluation data of teaching designs. The result proves that the combined fuzzy clustering based on F-statistic is more effective.

Download Full-text

Comparative study on textual data set using fuzzy clustering algorithms

Kybernetes ◽

10.1108/k-11-2015-0301 ◽

2016 ◽

Vol 45 (8) ◽

pp. 1232-1242 ◽

Cited By ~ 3

Author(s):

Rjiba Sadika ◽

Moez Soltani ◽

Saloua Benammou

Keyword(s):

Comparative Study ◽

Clustering Algorithms ◽

Fuzzy Model ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Fuzzy C Means ◽

Textual Data ◽

Tunisian Revolution ◽

Fuzzy C Means Algorithm

Purpose The purpose of this paper is to apply the Takagi-Sugeno (T-S) fuzzy model techniques in order to treat and classify textual data sets with and without noise. A comparative study is done in order to select the most accurate T-S algorithm in the textual data sets. Design/methodology/approach From a survey about what has been termed the “Tunisian Revolution,” the authors collect a textual data set from a questionnaire targeted at students. Five clustering algorithms are mainly applied: the Gath-Geva (G-G) algorithm, the modified G-G algorithm, the fuzzy c-means algorithm and the kernel fuzzy c-means algorithm. The authors examine the performances of the four clustering algorithms and select the most reliable one to cluster textual data. Findings The proposed methodology was to cluster textual data based on the T-S fuzzy model. On one hand, the results obtained using the T-S models are in the form of numerical relationships between selected keywords and the rest of words constituting a text. Consequently, it allows the authors to interpret these results not only qualitatively but also quantitatively. On the other hand, the proposed method is applied for clustering text taking into account the noise. Originality/value The originality comes from the fact that the authors validate some economical results based on textual data, even if they have not been written by experts in the linguistic fields. In addition, the results obtained in this study are easy and simple to interpret by the analysts.

Download Full-text

An Edge Exposure using Caliber Fuzzy C-means With Canny Algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v8.i1.pp59-68 ◽

2017 ◽

Vol 8 (1) ◽

pp. 59

Author(s):

Gowri Jeyaraman ◽

Janakiraman Subbiah

Keyword(s):

Edge Detection ◽

Clustering Algorithm ◽

Detection Algorithm ◽

Cluster Center ◽

Detection Techniques ◽

Fuzzy C Means ◽

Classical Study ◽

Detection Algorithms ◽

Initial Cluster ◽

Canny Algorithm

<p>Edge exposure or edge detection is an important and classical study of the medical field and computer vision. Caliber Fuzzy C-means (CFCM) clustering Algorithm for edge detection depends on the selection of initial cluster center value. This endeavor to put in order a collection of pixels into a cluster, such that a pixel within the cluster must be more comparable to every other pixel. Using CFCM techniques first cluster the BSDS image, next the clustered image is given as an input to the basic canny edge detection algorithm. The application of new parameters with fewer operations for CFCM is fruitful. According to the calculation, a result acquired by using CFCM clustering function divides the image into four clusters in common. The proposed method is evidently robust into the modification of fuzzy c-means and canny algorithm. The convergence of this algorithm is very speedy compare to the entire edge detection algorithms. The consequences of this proposed algorithm make enhanced edge detection and better result than any other traditional image edge detection techniques.</p>

Download Full-text

Improved Fuzzy C-Means Based on the Optimal Number of Clusters

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.392.803 ◽

2013 ◽

Vol 392 ◽

pp. 803-807 ◽

Cited By ~ 1

Author(s):

Xue Bo Feng ◽

Fang Yao ◽

Zhi Gang Li ◽

Xiao Jing Yang

Keyword(s):

Convergence Rate ◽

Clustering Algorithm ◽

Optimal Number ◽

Data Set ◽

Number Of Clusters ◽

Fuzzy C Means ◽

Initial Cluster ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Optimal Number Of Clusters

According to the number of cluster centers, initial cluster centers, fuzzy factor, iterations and threshold, Fuzzy C-means clustering algorithm (FCM) clusters the data set. FCM will encounter the initialization problem of clustering prototype. Firstly, the article combines the maximum and minimum distance algorithm and K-means algorithm to determine the number of clusters and the initial cluster centers. Secondly, the article determines the optimal number of clusters with Silhouette indicators. Finally, the article improves the convergence rate of FCM by revising membership constantly. The improved FCM has good clustering effect, enhances the optimized capability, and improves the efficiency and effectiveness of the clustering. It has better tightness in the class, scatter among classes and cluster stability and faster convergence rate than the traditional FCM clustering method.

Download Full-text

Pattern Recognition in Numerical Data Sets and Color Images through the Typicality Based on the GKPFCM Clustering Algorithm

Mathematical Problems in Engineering ◽

10.1155/2013/716753 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

B. Ojeda-Magaña ◽

R. Ruelas ◽

M. A. Corona Nakamura ◽

D. W. Carr Finch ◽

L. Gómez-Barba

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Synthetic Data ◽

Numerical Data ◽

Color Images ◽

Data Sets ◽

Data Set ◽

Fuzzy C Means ◽

Homogeneous Regions ◽

Analyze Data

We take the concept of typicality from the field of cognitive psychology, and we apply the meaning to the interpretation of numerical data sets and color images through fuzzy clustering algorithms, particularly the GKPFCM, looking to get better information from the processed data. The Gustafson Kessel Possibilistic Fuzzy c-means (GKPFCM) is a hybrid algorithm that is based on a relative typicality (membership degree, Fuzzy c-means) and an absolute typicality (typicality value, Possibilistic c-means). Thus, using both typicalities makes it possible to learn and analyze data as well as to relate the results with the theory of prototypes. In order to demonstrate these results we use a synthetic data set and a digitized image of a glass, in a first example, and images from the Berkley database, in a second example. The results clearly demonstrate the advantages of the information obtained about numerical data sets, taking into account the different meaning of typicalities and the availability of both values with the clustering algorithm used. This approach allows the identification of small homogeneous regions, which are difficult to find.

Download Full-text

A Hard C-Means Clustering Algorithm Incorporating Membership KL Divergence and Local Data Information for Noisy Image Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141850012x ◽

2017 ◽

Vol 32 (04) ◽

pp. 1850012 ◽

Cited By ~ 5

Author(s):

R. R. Gharieb ◽

G. Gendy ◽

H. Selim

Keyword(s):

Image Segmentation ◽

Membership Function ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Local Data ◽

Cluster Membership ◽

Kl Divergence ◽

Clustering Approach ◽

Center Distance

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.

Download Full-text