Likelihood Based Fuzzy Clustering for Data Sets of Mixed Features

Author(s):  
Mahnhoon Lee ◽  
Roelof K. Brouwer
2019 ◽  
Vol 65 ◽  
pp. 04008
Author(s):  
Kateryna Gorbatiuk ◽  
Olha Mantalyuk ◽  
Oksana Proskurovych ◽  
Oleksandr Valkov

Disparities in the development of regions in any country affect the entire national economy. Detecting the disparities can help formulate the proper economic policies for each region by taking action against the factors that slow down the economic growth. This study was conducted with the aim of applying clustering methods to analyse regional disparities based on the economic development indicators of the regions of Ukraine. There were considered fuzzy clustering methods, which generalize partition clustering methods by allowing objects to be partially classified into more than one cluster. Fuzzy clustering technique was applied using R packages to the data sets with the statistic indicators concerned to the economic activities in all administrative regions of Ukraine in 2017. Sets of development indicators for different sectors of economic activity, such as industry, agriculture, construction and services, were reviewed and analysed. The study showed that the regional cluster classification results strongly depend on the input development indicators and the clustering technique used for this purpose. Consideration of different partitions into fuzzy clusters opens up new opportunities in developing recommendations on how to differentiate economic policies in order to achieve maximum growth for the regions and the entire country.


Author(s):  
LOTFI BEN ROMDHANE ◽  
NADIA FADHEL ◽  
BECHIR AYEB

Data mining (DM) is a new emerging discipline that aims to extract knowledge from data using several techniques. DM turned out to be useful in business where the data describing the customers and their transactions is in the order of terabytes. In this paper, we propose an approach for building customer models (said also profiles in the literature) from business data. Our approach is three-step. In the first step, we use fuzzy clustering to categorize customers, i.e., determine groups of customers. A key feature is that the number of groups (or clusters) is computed automatically from data using the partition entropy as a validity criteria. In the second step, we proceed to a dimensionality reduction which aims at keeping for each group of customers only the most informative attributes. For this, we define the information loss to quantify the information degree of an attribute. Hence, and as a result to this second step, we obtain groups of customers each described by a distinct set of attributes. In the third and final step, we use backpropagation neural networks to extract useful knowledge from these groups. Experimental results on real-world data sets reveal a good performance of our approach and should simulate future research.


Author(s):  
Lawrence O. Hall ◽  
Dmitry B. Goldgof ◽  
Juana Canul-Reich ◽  
Prodip Hore ◽  
Weijian Cheng ◽  
...  

This chapter examines how to scale algorithms which learn fuzzy models from the increasing amounts of labeled or unlabeled data that are becoming available. Large data repositories are increasingly available, such as records of network transmissions, customer transactions, medical data, and so on. A question arises about how to utilize the data effectively for both supervised and unsupervised fuzzy learning. This chapter will focus on ensemble approaches to learning fuzzy models for large data sets which may be labeled or unlabeled. Further, the authors examine ways of scaling fuzzy clustering to extremely large data sets. Examples from existing data repositories, some quite large, will be given to show the approaches discussed here are effective.


2006 ◽  
Vol 37 (7) ◽  
pp. 755-778 ◽  
Author(s):  
Horng-Lin Shieh ◽  
Ying-Kuei Yang ◽  
Chien-Nan Lee

2015 ◽  
Vol 24 (03) ◽  
pp. 1550003 ◽  
Author(s):  
Armin Daneshpazhouh ◽  
Ashkan Sami

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.


2013 ◽  
Vol 22 (03) ◽  
pp. 1350013 ◽  
Author(s):  
OUIEM BCHIR ◽  
HICHEM FRIGUI ◽  
MOHAMED MAHER BEN ISMAIL

Many machine learning applications rely on learning distance functions with side information. Most of these distance metric learning approaches learns a Mahalanobis distance. While these approaches may work well when data is in low dimensionality, they become computationally expensive or even infeasible for high dimensional data. In this paper, we propose a novel method of learning nonlinear distance functions with side information while clustering the data. The new semi-supervised clustering approach is called Semi-Supervised Fuzzy clustering with Learnable Cluster dependent Kernels (SS-FLeCK). The proposed algorithm learns the underlying cluster-dependent dissimilarity measure while finding compact clusters in the given data set. The learned dissimilarity is based on a Gaussian kernel function with cluster dependent parameters. This objective function integrates penalty and reward cost functions. These cost functions are weighted by fuzzy membership degrees. Moreover, they use side-information in the form of a small set of constraints on which instances should or should not reside in the same cluster. The proposed algorithm uses only the pairwise relation between the feature vectors. This makes it applicable when similar objects cannot be represented by a single prototype. Using synthetic and real data sets, we show that SS-FLeCK outperforms several other algorithms.


Author(s):  
Sonia Goel ◽  
Meena Tushir

Introduction: Incomplete data sets containing some missing attributes is a prevailing problem in many research areas. The reasons for the lack of missing attributes may be several; human error in tabulating/recording the data, machine failure, errors in data acquisition or refusal of a patient/customer to answer few questions in a questionnaire or survey. Further, clustering of such data sets becomes a challenge. Objective: In this paper, we presented a critical review of various methodologies proposed for handling missing data in clustering. The focus of this paper is the comparison of various imputation techniques based FCM clustering and the four clustering strategies proposed by Hathway and Bezdek. Methods: In this paper, we imputed the missing values in incomplete datasets by various imputation/ non-imputation techniques to complete the data set and then conventional fuzzy clustering algorithm is applied to get the clustering results. Results: Experiments on various synthetic data sets and real data sets from UCI repository are carried out. To evaluate the performance of the various imputation/ non-imputation based FCM clustering algorithm, several performance criteria and statistical tests are considered. Experimental results on various data sets show that the linear interpolation based FCM clustering performs significantly better than other imputation as well as non-imputation techniques. Conclusion: It is concluded that the clustering algorithm is data specific, no clustering technique can give good results on all data sets. It depends upon both the data type and the percentage of missing attributes in the dataset. Through this study, we have shown that the linear interpolation based FCM clustering algorithm can be used effectively for clustering of incomplete data set.


Sign in / Sign up

Export Citation Format

Share Document