Takagi-Sugeno Modeling of Incomplete Data for Missing Value Imputation With the Use of Alternate Learning

Recently, the industry of healthcare started generating a large volume of datasets. If hospitals can employ the data, they could easily predict the outcomes and provide better treatments at early stages with low cost. Here, data analytics (DA) was used to make correct decisions through proper analysis and prediction. However, inappropriate data may lead to flawed analysis and thus yield unacceptable conclusions. Hence, transforming the improper data from the entire data set into useful data is essential. Machine learning (ML) technique was used to overcome the issues due to incomplete data. A new architecture, automatic missing value imputation (AMVI) was developed to predict missing values in the dataset, including data sampling and feature selection. Four prediction models (i.e., logistic regression, support vector machine (SVM), AdaBoost, and random forest algorithms) were selected from the well-known classification. The complete AMVI architecture performance was evaluated using a structured data set obtained from the UCI repository. Accuracy of around 90% was achieved. It was also confirmed from cross-validation that the trained ML model is suitable and not over-fitted. This trained model is developed based on the dataset, which is not dependent on a specific environment. It will train and obtain the outperformed model depending on the data available.

Download Full-text

Normalization and outlier removal in class center-based firefly algorithm for missing value imputation

Journal Of Big Data ◽

10.1186/s40537-021-00518-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Heru Nugroho ◽

Nugraha Priya Utama ◽

Kridanto Surendro

Keyword(s):

Incomplete Data ◽

Statistical Power ◽

Firefly Algorithm ◽

Missing Values ◽

Outlier Removal ◽

Processing Stage ◽

Missing Value ◽

Missing Value Imputation ◽

Almost All ◽

True Values

AbstractA missing value is one of the factors that often cause incomplete data in almost all studies, even those that are well-designed and controlled. It can also decrease a study’s statistical power or result in inaccurate estimations and conclusions. Hence, data normalization and missing value handling are considered the major problems in the data pre-processing stage, while classification algorithms are adopted to handle numerical features. In cases where the observed data contained outliers, the missing value estimated results are sometimes unreliable or even differ greatly from the true values. Therefore, this study aims to propose the combination of normalization and outlier removals before imputing missing values on the class center-based firefly algorithm method (ON + C3FA). Moreover, some standard imputation techniques like mean, a random value, regression, as well as multiple imputation, KNN imputation, and decision tree (DT)-based missing value imputation were utilized as a comparison of the proposed method. Experimental results on the sonar dataset showed normalization and outlier removals effect in the methods. According to the proposed method (ON + C3FA), AUC, accuracy, F1-Score, Precision, Recall, and AUC-PR had 0.972, 0.906, 0.906, 0.908, 0.906, 0.61 respectively. The result showed combining normalization and outlier removals in C3-FA (ON + C3FA) was an efficient technique for obtaining actual data in handling missing values, and it also outperformed the previous studies methods with r and RMSE values of 0.935 and 0.02. Meanwhile, the Dks value obtained from this technique was 0.04, which indicated that it could maintain the values or distribution accuracy.

Download Full-text

Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things

Proceedings of the 5th International Conference on Internet of Things, Big Data and Security ◽

10.5220/0009578201300137 ◽

2020 ◽

Author(s):

Benjamin Agbo ◽

Yongrui Qin ◽

Richard Hill

Keyword(s):

Internet Of Things ◽

Incomplete Data ◽

The Internet ◽

Missing Value ◽

Missing Value Imputation ◽

Best Fit ◽

The Internet Of Things

Download Full-text

Enriching Integrated Statistical Open City Data by Combining Equational Knowledge and Missing Value Imputation

SSRN Electronic Journal ◽

10.2139/ssrn.3199313 ◽

2018 ◽

Author(s):

Stefan Bischof ◽

Andreas Harth ◽

Benedikt KKmpgen ◽

Axel Polleres ◽

Patrik Schneider

Keyword(s):

Missing Value ◽

Missing Value Imputation

Download Full-text

Effective Missing Value Imputation Methods for Building Monitoring Data

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378230 ◽

2020 ◽

Author(s):

Brian Cho ◽

Teresa Dayrit ◽

Yuan Gao ◽

Zhe Wang ◽

Tianzhen Hong ◽

...

Keyword(s):

Monitoring Data ◽

Missing Value ◽

Imputation Methods ◽

Missing Value Imputation ◽

Building Monitoring

Download Full-text

IFGAN: Missing Value Imputation using Feature-specific Generative Adversarial Networks

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378240 ◽

2020 ◽

Author(s):

Wei Qiu ◽

Yangsibo Huang ◽

Quanzheng Li

Keyword(s):

Generative Adversarial Networks ◽

Missing Value ◽

Missing Value Imputation ◽

Adversarial Networks

Download Full-text

Takagi-Sugeno Modeling for missing value imputations based on RReliefF Iterative Learning

2021 The 5th International Conference on Compute and Data Analysis ◽

10.1145/3456529.3456542 ◽

2021 ◽

Author(s):

Yidan Lu ◽

Xiaochen Lai ◽

Liyong Zhang ◽

Juchao Song ◽

Zheng Zhang ◽

...

Keyword(s):

Iterative Learning ◽

Missing Value ◽

Takagi Sugeno

Download Full-text

A data-driven missing value imputation approach for longitudinal datasets

Artificial Intelligence Review ◽

10.1007/s10462-021-09963-5 ◽

2021 ◽

Author(s):

Caio Ribeiro ◽

Alex A. Freitas

Keyword(s):

Missing Data ◽

Longitudinal Data ◽

Missing Values ◽

Error Rates ◽

Imputation Method ◽

Data Driven ◽

Missing Value ◽

Missing Value Imputation ◽

Human Ageing ◽

Imputation Approach

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.

Download Full-text

A systematic review of machine learning-based missing value imputation techniques

Data Technologies and Applications ◽

10.1108/dta-12-2020-0298 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tressy Thomas ◽

Enayat Rajabi

Keyword(s):

Machine Learning ◽

Selection Process ◽

Evaluation Metrics ◽

Correct Prediction ◽

Data Sets ◽

Data Set ◽

Missing Value ◽

Content Type ◽

Missing Value Imputation ◽

Literature Reviews

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Download Full-text

A new imputation-based incomplete data-driven fuzzy modeling for accuracy improvement in ubiquitous computing applications

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-03-2021-0069 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sonia Goel ◽

Meena Tushir

Keyword(s):

Parameter Estimation ◽

Missing Data ◽

Incomplete Data ◽

Model Identification ◽

Fuzzy Model ◽

New Method ◽

Content Type ◽

Ubiquitous Environment ◽

Sugeno Model ◽

Takagi Sugeno

Purpose In real-world decision-making, high accuracy data analysis is essential in a ubiquitous environment. However, we encounter missing data while collecting user-related data information because of various privacy concerns on account of a user. This paper aims to deal with incomplete data for fuzzy model identification, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features. Design/methodology/approach In this work, authors proposed a three-fold approach for fuzzy model identification in which imputation-based linear interpolation technique is used to estimate missing features of the data, and then fuzzy c-means clustering is used for determining optimal number of rules and for the determination of parameters of membership functions of the fuzzy model. Finally, the optimization of the all antecedent and consequent parameters along with the width of the antecedent (Gaussian) membership function is done by gradient descent algorithm based on the minimization of root mean square error. Findings The proposed method is tested on two well-known simulation examples as well as on a real data set, and the performance is compared with some traditional methods. The result analysis and statistical analysis show that the proposed model has achieved a considerable improvement in accuracy in the presence of varying degree of data incompleteness. Originality/value The proposed method works well for fuzzy model identification method, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features with varying degree of missing data as compared to some well-known methods.

Download Full-text

Takagi-Sugeno Modeling of Incomplete Data for Missing Value Imputation With the Use of Alternate Learning

Automatic missing value imputation for cleaning phase of diabetic’s readmission prediction model

Normalization and outlier removal in class center-based firefly algorithm for missing value imputation

Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things

Enriching Integrated Statistical Open City Data by Combining Equational Knowledge and Missing Value Imputation

Effective Missing Value Imputation Methods for Building Monitoring Data

IFGAN: Missing Value Imputation using Feature-specific Generative Adversarial Networks

Takagi-Sugeno Modeling for missing value imputations based on RReliefF Iterative Learning

A data-driven missing value imputation approach for longitudinal datasets

A systematic review of machine learning-based missing value imputation techniques

A new imputation-based incomplete data-driven fuzzy modeling for accuracy improvement in ubiquitous computing applications

Export Citation Format