A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value

2015 ◽  
Vol 31 (4) ◽  
pp. 297-313 ◽  
Author(s):  
Liyong Zhang ◽  
Wei Lu ◽  
Xiaodong Liu ◽  
Witold Pedrycz ◽  
Chongquan Zhong ◽  
...  
2021 ◽  
Vol 18 (1) ◽  
pp. 22-30
Author(s):  
Erna Nurmawati ◽  
Robby Hasan Pangaribuan ◽  
Ibnu Santoso

One way to deal with the presence of missing value or incomplete data is to impute the data using EM Algorithm. The need for large and fast data processing is necessary to implement parallel computing on EM algorithm serial program. In the parallel program architecture of EM Algorithm in this study, the controller is only related to the EM module whereas the EM module itself uses matrix and vector modules intensively. Parallelization is done by using OpenMP in EM modules which results in faster compute time on parallel programs than serial programs. Parallel computing with a thread of 4 (four) increases speed up, reduces compute time, and reduces efficiency when compared to parallel computing by the number of threads 2 (two).


Author(s):  
Hemant Rana ◽  
Manohar Lal

Handling of missing attribute values are a big challenge for data analysis. For handling this type of problems, there are some well known approaches, including Rough Set Theory (RST) and classification via clustering. In the work reported here, RSES (Rough Set Exploration System) one of the tools based on RST approach, and WEKA (Waikato Environment for Knowledge Analysis), a data mining tool—based on classification via clustering—are used for predicting learning styles from given data, which possibly has missing values. The results of the experiments using the tools show that the problem of missing attribute values is better handled by RST approach as compared to the classification via clustering approach. Further, in respect of missing values, RSES yields better decision rules, if the missing values are simply ignored than the rules obtained by assigning some values in place of missing attribute values.


2014 ◽  
Vol 39 (2) ◽  
pp. 107-127 ◽  
Author(s):  
Artur Matyja ◽  
Krzysztof Siminski

Abstract The missing values are not uncommon in real data sets. The algorithms and methods used for the data analysis of complete data sets cannot always be applied to missing value data. In order to use the existing methods for complete data, the missing value data sets are preprocessed. The other solution to this problem is creation of new algorithms dedicated to missing value data sets. The objective of our research is to compare the preprocessing techniques and specialised algorithms and to find their most advantageous usage.


Author(s):  
Jesmeen Mohd Zebaral Hoque ◽  
Jakir Hossen ◽  
Shohel Sayeed ◽  
Chy. Mohammed Tawsif K. ◽  
Jaya Ganesan ◽  
...  

Recently, the industry of healthcare started generating a large volume of datasets. If hospitals can employ the data, they could easily predict the outcomes and provide better treatments at early stages with low cost. Here, data analytics (DA) was used to make correct decisions through proper analysis and prediction. However, inappropriate data may lead to flawed analysis and thus yield unacceptable conclusions. Hence, transforming the improper data from the entire data set into useful data is essential. Machine learning (ML) technique was used to overcome the issues due to incomplete data. A new architecture, automatic missing value imputation (AMVI) was developed to predict missing values in the dataset, including data sampling and feature selection. Four prediction models (i.e., logistic regression, support vector machine (SVM), AdaBoost, and random forest algorithms) were selected from the well-known classification. The complete AMVI architecture performance was evaluated using a structured data set obtained from the UCI repository. Accuracy of around 90% was achieved. It was also confirmed from cross-validation that the trained ML model is suitable and not over-fitted. This trained model is developed based on the dataset, which is not dependent on a specific environment. It will train and obtain the outperformed model depending on the data available.


2017 ◽  
Vol 1 (1) ◽  
pp. 13-23
Author(s):  
Frisca Rizki Ananda ◽  
Asep Saefuddin ◽  
Bagus Sartono

Cluster analysis is a method to classify observations into several clusters. A common strategy for clustering the observations uses distance as a similarity index. However distance approach cannot be applied when data is not complete. Genetic Algorithm is applied by involving variance (GACV) in order to solve this problem. This study employed GACV on Iris data that was introduced by Sir Ronald Fisher. Clustering the incomplete data was implemented on data which was produced by deleting some values of Iris data. The algorithm was developed under R 3.0.2 software and got satisfying result for clustering complete data with 95.99% sensitivity and 98% consistency. GACV could be applied to cluster observations with missing value without filling in the missing value or excluding these observations. Performance on clustering incomplete observations is also satisfying but tends to decrease as the proportion of incomplete values increases. The proportion of incomplete values should be less than or equal to 40% to get sensitivity and consistency not less than 90. Keywords: Cluster Analysis, Genetic Algorithm, Incomplete Data.


Author(s):  
Xiaochen Lai ◽  
Xin Liu ◽  
Liyong Zhang ◽  
Chi Lin ◽  
Mohammad S. Obaidat ◽  
...  

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Heru Nugroho ◽  
Nugraha Priya Utama ◽  
Kridanto Surendro

AbstractA missing value is one of the factors that often cause incomplete data in almost all studies, even those that are well-designed and controlled. It can also decrease a study’s statistical power or result in inaccurate estimations and conclusions. Hence, data normalization and missing value handling are considered the major problems in the data pre-processing stage, while classification algorithms are adopted to handle numerical features. In cases where the observed data contained outliers, the missing value estimated results are sometimes unreliable or even differ greatly from the true values. Therefore, this study aims to propose the combination of normalization and outlier removals before imputing missing values on the class center-based firefly algorithm method (ON  +  C3FA). Moreover, some standard imputation techniques like mean, a random value, regression, as well as multiple imputation, KNN imputation, and decision tree (DT)-based missing value imputation were utilized as a comparison of the proposed method. Experimental results on the sonar dataset showed normalization and outlier removals effect in the methods. According to the proposed method (ON  +  C3FA), AUC, accuracy, F1-Score, Precision, Recall, and AUC-PR had 0.972, 0.906, 0.906, 0.908, 0.906, 0.61 respectively. The result showed combining normalization and outlier removals in C3-FA (ON  +  C3FA) was an efficient technique for obtaining actual data in handling missing values, and it also outperformed the previous studies methods with r and RMSE values of 0.935 and 0.02. Meanwhile, the Dks value obtained from this technique was 0.04, which indicated that it could maintain the values or distribution accuracy.


Sign in / Sign up

Export Citation Format

Share Document