A COMPARISON OF CLUSTERING BY IMPUTATION AND SPECIAL CLUSTERING ALGORITHMS ON THE REAL INCOMPLETE DATA

The existence of missing values will really inhibit process of clustering. To overcome it, some of scientists have found several solutions. Both of them are imputation and special clustering algorithms. This paper compared the results of clustering by using them in incomplete data. K-means algorithms was utilized in the imputation data. The algorithms used were distribution free multiple imputation (DFMI), Gabriel eigen (GE), expectation maximization-singular value decomposition (EM-SVD), biplot imputation (BI), four algorithms of modified fuzzy c-means (FCM), k-means soft constraints (KSC), distance estimation strategy fuzzy c-means (DESFCM), k-means soft constraints imputed-observed (KSC-IO). The data used were the 2018 environmental performance index (EPI) and the simulation data. The optimal clustering on the 2018 EPI data would be chosen based on Silhouette index, where previously, it had been tested its capability in simulation dataset. The results showed that Silhouette index have the good capability to validate the clustering results in the incomplete dataset and the optimal clustering in the 2018 EPI dataset was obtained by k-means using BI where the silhouette index and time complexity were 0.613 and 0.063 respectively. Based on the results, k-means by using BI is suggested processing clustering analysis in the 2018 EPI dataset.

Download Full-text

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

MODIFIED POSSIBILISTIC FUZZY C-MEANS ALGORITHM FOR CLUSTERING INCOMPLETE DATA SETS

Acta Polytechnica ◽

10.14311/ap.2021.61.0364 ◽

2021 ◽

Vol 61 (2) ◽

pp. 364-377

Author(s):

. Rustam ◽

Koredianto Usman ◽

Mudyawati Kamaruddin ◽

Dina Chamidah ◽

. Nopendri ◽

...

Keyword(s):

Experimental Data ◽

Incomplete Data ◽

Missing Values ◽

Complete Data ◽

Noise Sensitivity ◽

Data Sets ◽

Fuzzy C Means ◽

Number Of Iterations ◽

Fuzzy C Means Algorithm

A possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm proposed to deal with the weaknesses associated with handling noise sensitivity and coincidence clusters in fuzzy c-means (FCM) and possibilistic c-means (PCM). However, the PFCM algorithm is only applicable to complete data sets. Therefore, this research modified the PFCM for clustering incomplete data sets to OCSPFCM and NPSPFCM with the performance evaluated based on three aspects, 1) accuracy percentage, 2) the number of iterations, and 3) centroid errors. The results showed that the NPSPFCM outperforms the OCSPFCM with missing values ranging from 5% − 30% for all experimental data sets. Furthermore, both algorithms provide average accuracies between 97.75%−78.98% and 98.86%−92.49%, respectively.

Download Full-text

Robust K-Median and K-Means Clustering Algorithms for Incomplete Data

Mathematical Problems in Engineering ◽

10.1155/2016/4321928 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 6

Author(s):

Jinhua Li ◽

Shiji Song ◽

Yuli Zhang ◽

Zhen Zhou

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Interval Data ◽

Accurate Estimation ◽

Data Sets ◽

Clustering Methods ◽

Estimation Errors ◽

Feature Values ◽

Time And Space Complexity

Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

Download Full-text

Fuzzy c-Means Classifier for Incomplete Data Sets with Outliers and Missing Values

International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06) ◽

10.1109/cimca.2005.1631511 ◽

2006 ◽

Cited By ~ 14

Author(s):

H. Ichihashi ◽

K. Honda

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Data Sets ◽

Fuzzy C Means

Download Full-text

Lookahead selective sampling for incomplete data

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2016-0062 ◽

2016 ◽

Vol 26 (4) ◽

pp. 871-884 ◽

Cited By ~ 1

Author(s):

Loai Abdallah ◽

Ilan Shimshoni

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Mean Shift ◽

Ensemble Clustering ◽

Selective Sampling ◽

Mean Shift Clustering ◽

Sampling Algorithms ◽

Instance Space ◽

Incomplete Datasets

AbstractMissing values in data are common in real world applications. There are several methods that deal with this problem. In this paper we present lookahead selective sampling (LSS) algorithms for datasets with missing values. We developed two versions of selective sampling. The first one integrates a distance function that can measure the similarity between pairs of incomplete points within the framework of the LSS algorithm. The second algorithm uses ensemble clustering in order to represent the data in a cluster matrix without missing values and then run the LSS algorithm based on the ensemble clustering instance space (LSS-EC). To construct the cluster matrix, we use the k-means and mean shift clustering algorithms especially modified to deal with incomplete datasets. We tested our algorithms on six standard numerical datasets from different fields. On these datasets we simulated missing values and compared the performance of the LSS and LSS-EC algorithms for incomplete data to two other basic methods. Our experiments show that the suggested selective sampling algorithms outperform the other methods.

Download Full-text

Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values

Knowledge-Based Systems ◽

10.1016/j.knosys.2016.01.048 ◽

2016 ◽

Vol 99 ◽

pp. 51-70 ◽

Cited By ~ 51

Author(s):

Liyong Zhang ◽

Wei Lu ◽

Xiaodong Liu ◽

Witold Pedrycz ◽

Chongquan Zhong

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Information Granules ◽

Fuzzy C Means ◽

Probabilistic Information ◽

Fuzzy C Means Clustering

Download Full-text

A New semi-supervised clustering for incomplete data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189744 ◽

2021 ◽

pp. 1-13

Author(s):

Sonia Goel ◽

Meena Tushir

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Complete Data ◽

Unlabeled Data ◽

Misclassification Rate ◽

Data Sets ◽

Clustering Methods ◽

Data Set ◽

Supervised Clustering

Semi-supervised clustering technique partitions the unlabeled data based on prior knowledge of labeled data. Most of the semi-supervised clustering algorithms exist only for the clustering of complete data, i.e., the data sets with no missing features. In this paper, an effort has been made to check the effectiveness of semi-supervised clustering when applied to incomplete data sets. The novelty of this approach is that it considers the missing features along with available knowledge (labels) of the data set. The linear interpolation imputation technique initially imputes the missing features of the data set, thus completing the data set. A semi-supervised clustering is now employed on this complete data set, and missing features are regularly updated within the clustering process. In the proposed work, the labeled percentage range used is 30, 40, 50, and 60% of the total data. Data is further altered by arbitrarily eliminating certain features of its components, which makes the data incomplete with partial labeling. The proposed algorithm utilizes both labeled and unlabeled data, along with certain missing values in the data. The proposed algorithm is evaluated using three performance indices, namely the misclassification rate, random index metric, and error rate. Despite the additional missing features, the proposed algorithm has been successfully implemented on real data sets and showed better/competing results than well-known standard semi-supervised clustering methods.

Download Full-text

Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms

European Journal of Operational Research ◽

10.1016/j.ejor.2005.03.039 ◽

2006 ◽

Vol 174 (3) ◽

pp. 1742-1759 ◽

Cited By ~ 150

Author(s):

Sueli A. Mingoti ◽

Joab O. Lima

Keyword(s):

Neural Network ◽

Hierarchical Clustering ◽

Clustering Algorithms ◽

Fuzzy C Means ◽

Som Neural Network

Download Full-text

A novel GPU based intrusion detection system using deep autoencoder with Fruitfly optimization

SN Applied Sciences ◽

10.1007/s42452-021-04579-4 ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

R. Sekhar ◽

K. Sasirekha ◽

P. S. Raja ◽

K. Thangavel

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Missing Values ◽

Detection System ◽

Radial Basis Function Network ◽

Detection Methods ◽

Support Vector ◽

Parameter Method ◽

List Type ◽

Fuzzy C Means

Abstract Intrusion Detection Systems (IDSs) have received more attention to safeguarding the vital information in a network system of an organization. Generally, the hackers are easily entering into a secured network through loopholes and smart attacks. In such situation, predicting attacks from normal packets is tedious, much challenging, time consuming and highly technical. As a result, different algorithms with varying learning and training capacity have been explored in the literature. However, the existing Intrusion Detection methods could not meet the desired performance requirements. Hence, this work proposes a new Intrusion Detection technique using Deep Autoencoder with Fruitfly Optimization. Initially, missing values in the dataset have been imputed with the Fuzzy C-Means Rough Parameter (FCMRP) algorithm which handles the imprecision in datasets with the exploit of fuzzy and rough sets while preserving crucial information. Then, robust features are extracted from Autoencoder with multiple hidden layers. Finally, the obtained features are fed to Back Propagation Neural Network (BPN) to classify the attacks. Furthermore, the neurons in the hidden layers of Deep Autoencoder are optimized with population based Fruitfly Optimization algorithm. Experiments have been conducted on NSL_KDD and UNSW-NB15 dataset. The computational results of the proposed intrusion detection system using deep autoencoder with BPN are compared with Naive Bayes, Support Vector Machine (SVM), Radial Basis Function Network (RBFN), BPN, and Autoencoder with Softmax. Article Highlights A hybridized model using Deep Autoencoder with Fruitfly Optimization is introduced to classify the attacks. Missing values have been imputed with the Fuzzy C-Means Rough Parameter method. The discriminate features are extracted using Deep Autoencoder with more hidden layers.

Download Full-text

Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines

International Journal of Neural Systems ◽

10.1142/s0129065797000318 ◽

1997 ◽

Vol 08 (03) ◽

pp. 301-315 ◽

Cited By ~ 8

Author(s):

Marcel J. Nijman ◽

Hilbert J. Kappen

Keyword(s):

Symmetry Breaking ◽

Incomplete Data ◽

Missing Values ◽

Nearest Neighbor ◽

Boltzmann Machine ◽

K Nearest Neighbor ◽

Data Set ◽

Input Space ◽

Learning Rules ◽

Radial Basis

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.

Download Full-text