Imputation Methods Outperform Missing-Indicator for Data Missing Completely at Random

Author(s):  
Antonio Pereira Barata ◽  
Frank W. Takes ◽  
H. Jaap van den Herik ◽  
Cor J. Veenman
PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252129
Author(s):  
Guobo Wang ◽  
Minglu Ma ◽  
Lili Jiang ◽  
Fengyun Chen ◽  
Liansheng Xu

Based on the missing situation and actual needs of maritime search and rescue data, multiple imputation methods were used to construct complete data sets under different missing patterns. Probability density curves and overimputation diagnostics were used to explore the effects of multiple imputation. The results showed that the Data Augmentation (DA) algorithm had the characteristics of high operation efficiency and good imputation effect, but the algorithm was not suitable for data imputation when there was a high data missing rate. The EMB algorithm effectively restored the distribution of datasets with different data missing rates, and was less affected by the missing position; the EMB algorithm could obtain a good imputation effect even when there was a high data missing rate. Overimputation diagnostics could not only reflect the data imputation effect, but also show the correlation between different datasets, which was of great importance for deep data mining and imputation effect improvement. The Expectation-Maximization with Bootstrap (EMB) algorithm had a poor estimation effect on extreme data and failed to reflect the dataset’s variability characteristics.


2022 ◽  
Vol 9 (3) ◽  
pp. 0-0

Missing data is universal complexity for most part of the research fields which introduces the part of uncertainty into data analysis. We can take place due to many types of motives such as samples mishandling, unable to collect an observation, measurement errors, aberrant value deleted, or merely be short of study. The nourishment area is not an exemption to the difficulty of data missing. Most frequently, this difficulty is determined by manipulative means or medians from the existing datasets which need improvements. The paper proposed hybrid schemes of MICE and ANN known as extended ANN to search and analyze the missing values and perform imputations in the given dataset. The proposed mechanism is efficiently able to analyze the blank entries and fill them with proper examining their neighboring records in order to improve the accuracy of the dataset. In order to validate the proposed scheme, the extended ANN is further compared against various recent algorithms or mechanisms to analyze the efficiency as well as the accuracy of the results.


Methodology ◽  
2017 ◽  
Vol 13 (2) ◽  
pp. 41-60
Author(s):  
Shahab Jolani ◽  
Maryam Safarkhani

Abstract. In randomized controlled trials (RCTs), a common strategy to increase power to detect a treatment effect is adjustment for baseline covariates. However, adjustment with partly missing covariates, where complete cases are only used, is inefficient. We consider different alternatives in trials with discrete-time survival data, where subjects are measured in discrete-time intervals while they may experience an event at any point in time. The results of a Monte Carlo simulation study, as well as a case study of randomized trials in smokers with attention deficit hyperactivity disorder (ADHD), indicated that single and multiple imputation methods outperform the other methods and increase precision in estimating the treatment effect. Missing indicator method, which uses a dummy variable in the statistical model to indicate whether the value for that variable is missing and sets the same value to all missing values, is comparable to imputation methods. Nevertheless, the power level to detect the treatment effect based on missing indicator method is marginally lower than the imputation methods, particularly when the missingness depends on the outcome. In conclusion, it appears that imputation of partly missing (baseline) covariates should be preferred in the analysis of discrete-time survival data.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Hanji He ◽  
Guangming Deng

We extend the mean empirical likelihood inference for response mean with data missing at random. The empirical likelihood ratio confidence regions are poor when the response is missing at random, especially when the covariate is high-dimensional and the sample size is small. Hence, we develop three bias-corrected mean empirical likelihood approaches to obtain efficient inference for response mean. As to three bias-corrected estimating equations, we get a new set by producing a pairwise-mean dataset. The method can increase the size of the sample for estimation and reduce the impact of the dimensional curse. Consistency and asymptotic normality of the maximum mean empirical likelihood estimators are established. The finite sample performance of the proposed estimators is presented through simulation, and an application to the Boston Housing dataset is shown.


2021 ◽  
Author(s):  
M. B. Mohammed ◽  
H. S. Zulkafli ◽  
M. B. Adam ◽  
N. Ali ◽  
I. A. Baba

Sign in / Sign up

Export Citation Format

Share Document