A Two-stage Deep Autoencoder-based Missing Data Imputation Method for Wind Farm SCADA Data

Data accuracy and completeness of the wind farm has great significance in wind power research. Because of the wind farm in the process of gathering data and transmission appears distorted and missing, and that leads the accuracy and integrity of data is greatly reduced, so the need for a wind farm data, outlier detection and missing data imputation. This paper outlier detection by statistical method based on 3σ criterion under the normal distribution, and use of the effectiveness of the recently distance interpolation and regression interpolation for missing data, outliers and replacement and interpolation, filled after data and accuracy are improved.

Get full-text (via PubEx)

A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network

Evolving Systems ◽

10.1007/s12530-021-09376-3 ◽

2021 ◽

Author(s):

Giovanni Amormino da Silva Júnior ◽

Alisson Marques da Silva

Keyword(s):

Missing Data ◽

Imputation Method ◽

Data Imputation ◽

Missing Data Imputation ◽

Fuzzy Network

Get full-text (via PubEx)

Advanced methods for missing values imputation based on similarity learning

PeerJ Computer Science ◽

10.7717/peerj-cs.619 ◽

2021 ◽

Vol 7 ◽

pp. e619

Author(s):

Khaled M. Fouad ◽

Mahmoud M. Ismail ◽

Ahmad Taher Azar ◽

Mona M. Arafa

Keyword(s):

Missing Data ◽

Missing Values ◽

Imputation Accuracy ◽

Nearest Neighbors ◽

Imputation Method ◽

Data Imputation ◽

K Nearest Neighbors ◽

Missing Data Imputation ◽

K Value ◽

Imputation Methods

The real-world data analysis and processing using data mining techniques often are facing observations that contain missing values. The main challenge of mining datasets is the existence of missing values. The missing values in a dataset should be imputed using the imputation method to improve the data mining methods’ accuracy and performance. There are existing techniques that use k-nearest neighbors algorithm for imputing the missing values but determining the appropriate k value can be a challenging task. There are other existing imputation techniques that are based on hard clustering algorithms. When records are not well-separated, as in the case of missing data, hard clustering provides a poor description tool in many cases. In general, the imputation depending on similar records is more accurate than the imputation depending on the entire dataset's records. Improving the similarity among records can result in improving the imputation performance. This paper proposes two numerical missing data imputation methods. A hybrid missing data imputation method is initially proposed, called KI, that incorporates k-nearest neighbors and iterative imputation algorithms. The best set of nearest neighbors for each missing record is discovered through the records similarity by using the k-nearest neighbors algorithm (kNN). To improve the similarity, a suitable k value is estimated automatically for the kNN. The iterative imputation method is then used to impute the missing values of the incomplete records by using the global correlation structure among the selected records. An enhanced hybrid missing data imputation method is then proposed, called FCKI, which is an extension of KI. It integrates fuzzy c-means, k-nearest neighbors, and iterative imputation algorithms to impute the missing data in a dataset. The fuzzy c-means algorithm is selected because the records can belong to multiple clusters at the same time. This can lead to further improvement for similarity. FCKI searches a cluster, instead of the whole dataset, to find the best k-nearest neighbors. It applies two levels of similarity to achieve a higher imputation accuracy. The performance of the proposed imputation techniques is assessed by using fifteen datasets with variant missing ratios for three types of missing data; MCAR, MAR, MNAR. These different missing data types are generated in this work. The datasets with different sizes are used in this paper to validate the model. Therefore, proposed imputation techniques are compared with other missing data imputation methods by means of three measures; the root mean square error (RMSE), the normalized root mean square error (NRMSE), and the mean absolute error (MAE). The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.

Get full-text (via PubEx)

A Hybrid Missing Data Imputation Method for Constructing City Mobility Indices

Communications in Computer and Information Science - Data Mining ◽

10.1007/978-981-13-6661-1_11 ◽

2019 ◽

pp. 135-148

Author(s):

Sanaz Nikfalazar ◽

Chung-Hsing Yeh ◽

Susan Bedingfield ◽

Hadi Akbarzadeh Khorshidi

Keyword(s):

Missing Data ◽

Imputation Method ◽

Data Imputation ◽

Missing Data Imputation

Get full-text (via PubEx)

Missing Data Imputation Method for Autism Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4551.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 940-944

Keyword(s):

Machine Learning ◽

Missing Data ◽

Missing Values ◽

Imputation Method ◽

Support Vector ◽

Data Imputation ◽

Missing Data Imputation ◽

Imputation Methods ◽

Significant Difference ◽

Friedman's Test

Missing data imputation is essential task becauseremoving all records with missing values will discard useful information from other attributes. This paper estimates the performanceof prediction for autism dataset with imputed missing values. Statistical imputation methods like mean, imputation with zero or constant and machine learning imputation methods like K-nearest neighbour chained Equation methods were compared with the proposed deep learning imputation method. The predictions of patients with autistic spectrum disorder were measured using support vector machine for imputed dataset. Among the imputation methods, Deeplearningalgorithm outperformed statistical and machine learning imputation methods. The same is validated using significant difference in p values revealed using Friedman’s test

Get full-text (via PubEx)