Likelihood Based Fuzzy Clustering for Data Sets of Mixed Features

Disparities in the development of regions in any country affect the entire national economy. Detecting the disparities can help formulate the proper economic policies for each region by taking action against the factors that slow down the economic growth. This study was conducted with the aim of applying clustering methods to analyse regional disparities based on the economic development indicators of the regions of Ukraine. There were considered fuzzy clustering methods, which generalize partition clustering methods by allowing objects to be partially classified into more than one cluster. Fuzzy clustering technique was applied using R packages to the data sets with the statistic indicators concerned to the economic activities in all administrative regions of Ukraine in 2017. Sets of development indicators for different sectors of economic activity, such as industry, agriculture, construction and services, were reviewed and analysed. The study showed that the regional cluster classification results strongly depend on the input development indicators and the clustering technique used for this purpose. Consideration of different partitions into fuzzy clusters opens up new opportunities in developing recommendations on how to differentiate economic policies in order to achieve maximum growth for the regions and the entire country.

Download Full-text

BUILDING CUSTOMER MODELS FROM BUSINESS DATA: AN AUTOMATIC APPROACH BASED ON FUZZY CLUSTERING AND MACHINE LEARNING

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026809002692 ◽

2009 ◽

Vol 08 (04) ◽

pp. 445-465 ◽

Cited By ~ 1

Author(s):

LOTFI BEN ROMDHANE ◽

NADIA FADHEL ◽

BECHIR AYEB

Keyword(s):

Fuzzy Clustering ◽

Information Loss ◽

Second Step ◽

Future Research ◽

Data Sets ◽

Real World Data ◽

Useful Knowledge ◽

The Third ◽

Validity Criteria ◽

Business Data

Data mining (DM) is a new emerging discipline that aims to extract knowledge from data using several techniques. DM turned out to be useful in business where the data describing the customers and their transactions is in the order of terabytes. In this paper, we propose an approach for building customer models (said also profiles in the literature) from business data. Our approach is three-step. In the first step, we use fuzzy clustering to categorize customers, i.e., determine groups of customers. A key feature is that the number of groups (or clusters) is computed automatically from data using the partition entropy as a validity criteria. In the second step, we proceed to a dimensionality reduction which aims at keeping for each group of customers only the most informative attributes. For this, we define the information loss to quantify the information degree of an attribute. Hence, and as a result to this second step, we obtain groups of customers each described by a distinct set of attributes. In the third and final step, we use backpropagation neural networks to extract useful knowledge from these groups. Experimental results on real-world data sets reveal a good performance of our approach and should simulate future research.

Download Full-text

Support Vector Machine Classification Based on Fuzzy Clustering for Large Data Sets

Lecture Notes in Computer Science - MICAI 2006: Advances in Artificial Intelligence ◽

10.1007/11925231_54 ◽

2006 ◽

pp. 572-582 ◽

Cited By ~ 18

Author(s):

Jair Cervantes ◽

Xiaoou Li ◽

Wen Yu

Keyword(s):

Support Vector Machine ◽

Fuzzy Clustering ◽

Large Data ◽

Large Data Sets ◽

Support Vector ◽

Data Sets ◽

Support Vector Machine Classification

Download Full-text

Scaling Fuzzy Models

Scalable Fuzzy Algorithms for Data Management and Analysis ◽

10.4018/978-1-60566-858-1.ch002 ◽

2010 ◽

pp. 31-53 ◽

Cited By ~ 1

Author(s):

Lawrence O. Hall ◽

Dmitry B. Goldgof ◽

Juana Canul-Reich ◽

Prodip Hore ◽

Weijian Cheng ◽

...

Keyword(s):

Fuzzy Clustering ◽

Approaches To Learning ◽

Large Data ◽

Medical Data ◽

Large Data Sets ◽

Data Sets ◽

Data Repositories ◽

Fuzzy Models ◽

Fuzzy Learning ◽

Existing Data

This chapter examines how to scale algorithms which learn fuzzy models from the increasing amounts of labeled or unlabeled data that are becoming available. Large data repositories are increasingly available, such as records of network transmissions, customer transactions, medical data, and so on. A question arises about how to utilize the data effectively for both supervised and unsupervised fuzzy learning. This chapter will focus on ensemble approaches to learning fuzzy models for large data sets which may be labeled or unlabeled. Further, the authors examine ways of scaling fuzzy clustering to extremely large data sets. Examples from existing data repositories, some quite large, will be given to show the approaches discussed here are effective.

Download Full-text

Research on Fuzzy Clustering Algorithms for Large Dimensional Data Sets Under Cloud Computing

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advanced Hybrid Information Processing ◽

10.1007/978-3-030-67871-5_27 ◽

2021 ◽

pp. 295-305

Author(s):

Shuang-cheng Jia ◽

Feng-ping Yang

Keyword(s):

Cloud Computing ◽

Fuzzy Clustering ◽

Clustering Algorithms ◽

Data Sets

Download Full-text

A ROBUST APPROACH OF FUZZY CLUSTERING ON DATA SETS WITH NOISE AND OUTLIERS

Cybernetics & Systems ◽

10.1080/01969720600887061 ◽

2006 ◽

Vol 37 (7) ◽

pp. 755-778 ◽

Cited By ~ 7

Author(s):

Horng-Lin Shieh ◽

Ying-Kuei Yang ◽

Chien-Nan Lee

Keyword(s):

Fuzzy Clustering ◽

Data Sets ◽

Robust Approach

Download Full-text

Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500037 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550003 ◽

Cited By ~ 1

Author(s):

Armin Daneshpazhouh ◽

Ashkan Sami

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Fuzzy Clustering ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Real World Applications

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.

Download Full-text

SEMI-SUPERVISED FUZZY CLUSTERING WITH LEARNABLE CLUSTER DEPENDENT KERNELS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500139 ◽

2013 ◽

Vol 22 (03) ◽

pp. 1350013 ◽

Cited By ~ 2

Author(s):

OUIEM BCHIR ◽

HICHEM FRIGUI ◽

MOHAMED MAHER BEN ISMAIL

Keyword(s):

Fuzzy Clustering ◽

Side Information ◽

Metric Learning ◽

Real Data ◽

Distance Functions ◽

Gaussian Kernel ◽

Cost Functions ◽

Data Sets ◽

Learning Approaches ◽

Data Set

Many machine learning applications rely on learning distance functions with side information. Most of these distance metric learning approaches learns a Mahalanobis distance. While these approaches may work well when data is in low dimensionality, they become computationally expensive or even infeasible for high dimensional data. In this paper, we propose a novel method of learning nonlinear distance functions with side information while clustering the data. The new semi-supervised clustering approach is called Semi-Supervised Fuzzy clustering with Learnable Cluster dependent Kernels (SS-FLeCK). The proposed algorithm learns the underlying cluster-dependent dissimilarity measure while finding compact clusters in the given data set. The learned dissimilarity is based on a Gaussian kernel function with cluster dependent parameters. This objective function integrates penalty and reward cost functions. These cost functions are weighted by fuzzy membership degrees. Moreover, they use side-information in the form of a small set of constraints on which instances should or should not reside in the same cluster. The proposed algorithm uses only the pairwise relation between the feature vectors. This makes it applicable when similar objects cannot be represented by a single prototype. Using synthetic and real data sets, we show that SS-FLeCK outperforms several other algorithms.

Download Full-text

Different Approaches for Missing Data Handling in Fuzzy Clustering: A Review

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096512666191127121710 ◽

2020 ◽

Vol 13 (6) ◽

pp. 833-846

Author(s):

Sonia Goel ◽

Meena Tushir

Keyword(s):

Missing Data ◽

Fuzzy Clustering ◽

Incomplete Data ◽

Clustering Algorithm ◽

Linear Interpolation ◽

Performance Criteria ◽

Data Sets ◽

Data Set ◽

Fcm Clustering ◽

Missing Attributes

Introduction: Incomplete data sets containing some missing attributes is a prevailing problem in many research areas. The reasons for the lack of missing attributes may be several; human error in tabulating/recording the data, machine failure, errors in data acquisition or refusal of a patient/customer to answer few questions in a questionnaire or survey. Further, clustering of such data sets becomes a challenge. Objective: In this paper, we presented a critical review of various methodologies proposed for handling missing data in clustering. The focus of this paper is the comparison of various imputation techniques based FCM clustering and the four clustering strategies proposed by Hathway and Bezdek. Methods: In this paper, we imputed the missing values in incomplete datasets by various imputation/ non-imputation techniques to complete the data set and then conventional fuzzy clustering algorithm is applied to get the clustering results. Results: Experiments on various synthetic data sets and real data sets from UCI repository are carried out. To evaluate the performance of the various imputation/ non-imputation based FCM clustering algorithm, several performance criteria and statistical tests are considered. Experimental results on various data sets show that the linear interpolation based FCM clustering performs significantly better than other imputation as well as non-imputation techniques. Conclusion: It is concluded that the clustering algorithm is data specific, no clustering technique can give good results on all data sets. It depends upon both the data type and the percentage of missing attributes in the dataset. Through this study, we have shown that the linear interpolation based FCM clustering algorithm can be used effectively for clustering of incomplete data set.

Download Full-text