Single Imputation Methods

AbstractObjectiveFFQs are a popular method of capturing dietary information in epidemiological studies and may be used to derive dietary exposures such as nutrient intake or overall dietary patterns and diet quality. As FFQs can involve large numbers of questions, participants may fail to respond to all questions, leaving researchers to decide how to deal with missing data when deriving intake measures. The aim of the present commentary is to discuss the current practice for dealing with item non-response in FFQs and to propose a research agenda for reporting and handling missing data in FFQs.ResultsSingle imputation techniques, such as zero imputation (assuming no consumption of the item) or mean imputation, are commonly used to deal with item non-response in FFQs. However, single imputation methods make strong assumptions about the missing data mechanism and do not reflect the uncertainty created by the missing data. This can lead to incorrect inference about associations between diet and health outcomes. Although the use of multiple imputation methods in epidemiology has increased, these have seldom been used in the field of nutritional epidemiology to address missing data in FFQs. We discuss methods for dealing with item non-response in FFQs, highlighting the assumptions made under each approach.ConclusionsResearchers analysing FFQs should ensure that missing data are handled appropriately and clearly report how missing data were treated in analyses. Simulation studies are required to enable systematic evaluation of the utility of various methods for handling item non-response in FFQs under different assumptions about the missing data mechanism.

Download Full-text

Filling the Missing Data of Air Pollutant Concentration Using Single Imputation Methods

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.754-755.923 ◽

2015 ◽

Vol 754-755 ◽

pp. 923-932 ◽

Cited By ~ 2

Author(s):

Norazian Mohamed Noor ◽

A.S. Yahaya ◽

N.A. Ramli ◽

Mohd Mustafa Al Bakri Abdullah

Keyword(s):

Missing Data ◽

Missing Values ◽

Mean Squared Error ◽

Absolute Error ◽

Linear Interpolation ◽

Peninsular Malaysia ◽

Air Pollutant ◽

Imputation Methods ◽

Single Imputation ◽

Averaging Time

Hourly measured PM10 concentration at eight monitoring stations within peninsular Malaysia in 2006 was used to conduct the simulated missing data. The gap lengths of the simulated missing values are limited to 12 hours since the actual trend of missingness is considered short. Two percentages of simulated missing gaps were generated that are 5 % and 15 %. A number of single imputation methods (linear interpolation (LI), nearest neighbour interpolation (NN), mean above below (MAB), daily mean (DM), mean 12-hour (12M), mean 6-hour (6M), row mean (RM) and previous year (PY)) were calculated to fill in the simulated missing data. In addition, multiple imputation (MI) was also conducted to compare between the single imputation methods. The performances were evaluated using four statistical criteria namely mean absolute error, root mean squared error, prediction accuracy and index of agreement. The results show that 6M perform comparably well to LI. Thus, this show that the effect of smaller averaging time gives better prediction. Other single imputation methods predict the missing data well except for PY. RM and MI performs moderately with the increasing performance in higher fraction of missing gaps whereas LR makes the worst methods for both simulated missing data percentages.

Download Full-text

Evaluating Methods for Imputing Missing Data from Longitudinal Monitoring of Athlete Workload

Journal of Sports Science and Medicine ◽

10.52082/jssm.2021.188 ◽

2021 ◽

pp. 188-196 ◽

Cited By ~ 1

Author(s):

Lauren C. Benson ◽

Carlyn Stilling ◽

Oluwatoyosi B.A. Owoeye ◽

Carolyn A. Emery

Keyword(s):

Machine Learning ◽

Missing Data ◽

Multiple Imputation ◽

Perceived Exertion ◽

Pearson Correlation ◽

Imputation Method ◽

Rating Of Perceived Exertion ◽

Imputation Methods ◽

Single Imputation ◽

The Individual

Missing data can influence calculations of accumulated athlete workload. The objectives were to identify the best single imputation methods and examine workload trends using multiple imputation. External (jumps per hour) and internal (rating of perceived exertion; RPE) workload were recorded for 93 (45 females, 48 males) high school basketball players throughout a season. Recorded data were simulated as missing and imputed using ten imputation methods based on the context of the individual, team and session. Both single imputation and machine learning methods were used to impute the simulated missing data. The difference between the imputed data and the actual workload values was computed as root mean squared error (RMSE). A generalized estimating equation determined the effect of imputation method on RMSE. Multiple imputation of the original dataset, with all known and actual missing workload data, was used to examine trends in longitudinal workload data. Following multiple imputation, a Pearson correlation evaluated the longitudinal association between jump count and sRPE over the season. A single imputation method based on the specific context of the session for which data are missing (team mean) was only outperformed by methods that combine information about the session and the individual (machine learning models). There was a significant and strong association between jump count and sRPE in the original data and imputed datasets using multiple imputation. The amount and nature of the missing data should be considered when choosing a method for single imputation of workload data in youth basketball. Multiple imputation using several predictor variables in a regression model can be used for analyses where workload is accumulated across an entire season.

Download Full-text

Single Imputation Methods

Statistical Analysis with Missing Data - Wiley Series in Probability and Statistics ◽

10.1002/9781119013563.ch4 ◽

2014 ◽

pp. 59-74 ◽

Cited By ~ 3

Author(s):

Roderick J. A. Little ◽

Donald B. Rubin

Keyword(s):

Imputation Methods ◽

Single Imputation

Download Full-text

Single Imputation Methods and Confidence Intervals for the Gini Index

Mathematics ◽

10.3390/math9243252 ◽

2021 ◽

Vol 9 (24) ◽

pp. 3252

Author(s):

Encarnación Álvarez-Verdejo ◽

Pablo J. Moya-Fernández ◽

Juan F. Muñoz-Rosas

Keyword(s):

Missing Data ◽

Correlation Coefficient ◽

Confidence Intervals ◽

Gini Index ◽

Imputation Method ◽

Empirical Measures ◽

Imputation Methods ◽

Single Imputation ◽

Regression Imputation ◽

Mean Square Errors

The problem of missing data is a common feature in any study, and a single imputation method is often applied to deal with this problem. The first contribution of this paper is to analyse the empirical performance of some traditional single imputation methods when they are applied to the estimation of the Gini index, a popular measure of inequality used in many studies. Various methods for constructing confidence intervals for the Gini index are also empirically evaluated. We consider several empirical measures to analyse the performance of estimators and confidence intervals, allowing us to quantify the magnitude of the non-response bias problem. We find extremely large biases under certain non-response mechanisms, and this problem gets noticeably worse as the proportion of missing data increases. For a large correlation coefficient between the target and auxiliary variables, the regression imputation method may notably mitigate this bias problem, yielding appropriate mean square errors. We also find that confidence intervals have poor coverage rates when the probability of data being missing is not uniform, and that the regression imputation method substantially improves the handling of this problem as the correlation coefficient increases.

Download Full-text

Single Imputation Methods Applied to a Global Geothermal Database

Advances in Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04491-6_14 ◽

2018 ◽

pp. 183-194

Author(s):

Román-Flores Mariana Alelhí ◽

Santamaría-Bonfil Guillermo ◽

Díaz-González Lorena ◽

Arroyo-Figueroa Gustavo

Keyword(s):

Imputation Methods ◽

Single Imputation

Download Full-text

Some of the Single-Imputation Methods for Handling Missing Values

International Journal Of Data Mining And Emerging Technologies ◽

10.5958/j.2249-3220.2.2.007 ◽

2012 ◽

Vol 2 (2) ◽

pp. 49

Author(s):

Hitesh Chhinkaniwala ◽

Bhavisha Suthar ◽

Sanjay Garg

Keyword(s):

Missing Values ◽

Imputation Methods ◽

Single Imputation

Download Full-text

Comparison of Single Imputation Methods in 2×2 Cross-Over Design with Missing Observations

Korean Journal of Applied Statistics ◽

10.5351/kjas.2015.28.3.529 ◽

2015 ◽

Vol 28 (3) ◽

pp. 529-540

Author(s):

Bobae Jo ◽

Dongjae Kim

Keyword(s):

Missing Observations ◽

Imputation Methods ◽

Single Imputation

Download Full-text

The Effect of Partly Missing Covariates on Statistical Power in Randomized Controlled Trials With Discrete-Time Survival Endpoints

Methodology ◽

10.1027/1614-2241/a000121 ◽

2017 ◽

Vol 13 (2) ◽

pp. 41-60

Author(s):

Shahab Jolani ◽

Maryam Safarkhani

Keyword(s):

Randomized Controlled Trials ◽

Discrete Time ◽

Treatment Effect ◽

Survival Data ◽

Controlled Trials ◽

Missing Covariates ◽

Indicator Method ◽

Imputation Methods ◽

Randomized Controlled ◽

Baseline Covariates

Abstract. In randomized controlled trials (RCTs), a common strategy to increase power to detect a treatment effect is adjustment for baseline covariates. However, adjustment with partly missing covariates, where complete cases are only used, is inefficient. We consider different alternatives in trials with discrete-time survival data, where subjects are measured in discrete-time intervals while they may experience an event at any point in time. The results of a Monte Carlo simulation study, as well as a case study of randomized trials in smokers with attention deficit hyperactivity disorder (ADHD), indicated that single and multiple imputation methods outperform the other methods and increase precision in estimating the treatment effect. Missing indicator method, which uses a dummy variable in the statistical model to indicate whether the value for that variable is missing and sets the same value to all missing values, is comparable to imputation methods. Nevertheless, the power level to detect the treatment effect based on missing indicator method is marginally lower than the imputation methods, particularly when the missingness depends on the outcome. In conclusion, it appears that imputation of partly missing (baseline) covariates should be preferred in the analysis of discrete-time survival data.

Download Full-text