data missing
Recently Published Documents


TOTAL DOCUMENTS

134
(FIVE YEARS 37)

H-INDEX

17
(FIVE YEARS 2)

2022 ◽  
Vol 9 (3) ◽  
pp. 0-0

Missing data is universal complexity for most part of the research fields which introduces the part of uncertainty into data analysis. We can take place due to many types of motives such as samples mishandling, unable to collect an observation, measurement errors, aberrant value deleted, or merely be short of study. The nourishment area is not an exemption to the difficulty of data missing. Most frequently, this difficulty is determined by manipulative means or medians from the existing datasets which need improvements. The paper proposed hybrid schemes of MICE and ANN known as extended ANN to search and analyze the missing values and perform imputations in the given dataset. The proposed mechanism is efficiently able to analyze the blank entries and fill them with proper examining their neighboring records in order to improve the accuracy of the dataset. In order to validate the proposed scheme, the extended ANN is further compared against various recent algorithms or mechanisms to analyze the efficiency as well as the accuracy of the results.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Wei Wei ◽  
Chongshi Gu ◽  
Xiao Fu

A large amount of data obtained by dam safety monitoring provides the basis to evaluate the dam operation state. Due to the interference caused by equipment failure and human error, it is common or even inevitable to suffer the loss of measurement data. Most of the traditional data processing methods for dam monitoring ignore the actual correlation between different measurement points, which brings difficulties to the objective diagnosis of dam safety and even leads to misdiagnosis. Therefore, it is necessary to conduct further study on how to process the missing data in dam safety monitoring. In this study, a data processing method based on partial distance combining fuzzy C-means with long short-term memory (PDS-FCM-LSTM) was proposed to deal with the data missing from dam monitoring. Based on the fuzzy clustering performed for the measurement points of the same category deployed on the dam, the membership degree of each measurement point to cluster center was described by using the fuzzy C-means clustering algorithm based on partial distance (PDS-FCM), so as to determine the clustering results and preprocess the missing data of corresponding measurement points. Then, the bidirectional long short-term memory (LSTM) network was applied to explore the pattern of changes of measurement values under identical clustering conditions, thus processing the data missing from monitoring effectively.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252129
Author(s):  
Guobo Wang ◽  
Minglu Ma ◽  
Lili Jiang ◽  
Fengyun Chen ◽  
Liansheng Xu

Based on the missing situation and actual needs of maritime search and rescue data, multiple imputation methods were used to construct complete data sets under different missing patterns. Probability density curves and overimputation diagnostics were used to explore the effects of multiple imputation. The results showed that the Data Augmentation (DA) algorithm had the characteristics of high operation efficiency and good imputation effect, but the algorithm was not suitable for data imputation when there was a high data missing rate. The EMB algorithm effectively restored the distribution of datasets with different data missing rates, and was less affected by the missing position; the EMB algorithm could obtain a good imputation effect even when there was a high data missing rate. Overimputation diagnostics could not only reflect the data imputation effect, but also show the correlation between different datasets, which was of great importance for deep data mining and imputation effect improvement. The Expectation-Maximization with Bootstrap (EMB) algorithm had a poor estimation effect on extreme data and failed to reflect the dataset’s variability characteristics.


2021 ◽  
Vol 25 (3) ◽  
pp. 1643-1670
Author(s):  
Song Shu ◽  
Hongxing Liu ◽  
Richard A. Beck ◽  
Frédéric Frappart ◽  
Johanna Korhonen ◽  
...  

Abstract. A total of 13 satellite missions have been launched since 1985, with different types of radar altimeters on board. This study intends to make a comprehensive evaluation of historic and currently operational satellite radar altimetry missions for lake water level retrieval over the same set of lakes and to develop a strategy for constructing consistent long-term water level records for inland lakes at global scale. The lake water level estimates produced by different retracking algorithms (retrackers) of the satellite missions were compared with the gauge measurements over 12 lakes in four countries. The performance of each retracker was assessed in terms of the data missing rate, the correlation coefficient r, the bias, and the root mean square error (RMSE) between the altimetry-derived lake water level estimates and the concurrent gauge measurements. The results show that the model-free retrackers (e.g., OCOG/Ice-1/Ice) outperform the model-based retrackers for most of the missions, particularly over small lakes. Among the satellite altimetry missions, Sentinel-3 gave the best results, followed by SARAL. ENVISAT has slightly better lake water level estimates than Jason-1 and Jason-2, but its data missing rate is higher. For small lakes, ERS-1 and ERS-2 missions provided more accurate lake water level estimates than the TOPEX/Poseidon mission. In contrast, for large lakes, TOPEX/Poseidon is a better option due to its lower data missing rate and shorter repeat cycle. GeoSat and GeoSat Follow-On (GFO) both have an extremely high data missing rate of lake water level estimates. Although several contemporary radar altimetry missions provide more accurate lake level estimates than GFO, GeoSat was the sole radar altimetry mission, between 1985 and 1990, that provided the lake water level estimates. With a full consideration of the performance and the operational duration, the best strategy for constructing long-term lake water level records should be a two-step bias correction and normalization procedure. In the first step, use Jason-2 as the initial reference to estimate the systematic biases with TOPEX/Poseidon, Jason-1, and Jason-3 and then normalize them to form a consistent TOPEX/Poseidon–Jason series. Then, use the TOPEX/Poseidon–Jason series as the reference to estimate and remove systematic biases with other radar altimetry missions to construct consistent long-term lake water level series for ungauged lakes.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Qinming Liu ◽  
Wenyi Liu ◽  
Jiajian Mei ◽  
Guojin Si ◽  
Tangbin Xia ◽  
...  

Actually, it is difficult to obtain a large number of sample data due to equipment failure, and small sample data may also be missing. This paper proposes a novel small sample data missing filling method based on support vector regression (SVR) and genetic algorithm (GA) to improve equipment health diagnosis effect. First, the genetic algorithm is used to optimize support vector regression, and a new method GA-SVR can be proposed. The GA-SVR model is trained by using other data of the variable to which the missing data belongs, and the single-variable prediction method can be obtained. The correlation analysis is used to reconstruct the training set, and the GA-SVR is trained by using the data of the variables related to the missing data to obtain the multivariate prediction method. Then, the dynamic weight is presented to combine the single-variable prediction method with the multiple-variable prediction method based on certain principles, and the missing data are filled with the combined prediction methods. The filled data are used as input of GA-SVM to diagnose equipment failure. Finally, a case study is given to verify the applicability and effectiveness of the proposed method.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Jianqi Yu

This article firstly defines hierarchical data missing pattern, which is a generalization of monotone data missing pattern. Then multivariate Behrens–Fisher problem with hierarchical missing data is considered to illustrate that how ideas in dealing with monotone missing data can be extended to deal with hierarchical missing pattern. A pivotal quantity similar to the Hotelling T 2 is presented, and the moment matching method is used to derive its approximate distribution which is for testing and interval estimation. The precision of the approximation is illustrated through Monte Carlo data simulation. The results indicate that the approximate method is very satisfactory even for moderately small samples.


Sign in / Sign up

Export Citation Format

Share Document