scholarly journals Proposal for a new single imputation method using Taguchi’s T-method

2020 ◽  
Vol 5 (3) ◽  
pp. 102-110
Author(s):  
Yuto Nakao ◽  
Yasushi Nagata
2013 ◽  
Vol 6 (10) ◽  
pp. 1780-1784 ◽  
Author(s):  
Nurulkamal Masseran ◽  
Ahmad Mahir Razali ◽  
Kamarulzaman Ibrahim ◽  
Azami Zaharim ◽  
Kamaruzzaman Sopian

2021 ◽  
pp. 188-196 ◽  
Author(s):  
Lauren C. Benson ◽  
Carlyn Stilling ◽  
Oluwatoyosi B.A. Owoeye ◽  
Carolyn A. Emery

Missing data can influence calculations of accumulated athlete workload. The objectives were to identify the best single imputation methods and examine workload trends using multiple imputation. External (jumps per hour) and internal (rating of perceived exertion; RPE) workload were recorded for 93 (45 females, 48 males) high school basketball players throughout a season. Recorded data were simulated as missing and imputed using ten imputation methods based on the context of the individual, team and session. Both single imputation and machine learning methods were used to impute the simulated missing data. The difference between the imputed data and the actual workload values was computed as root mean squared error (RMSE). A generalized estimating equation determined the effect of imputation method on RMSE. Multiple imputation of the original dataset, with all known and actual missing workload data, was used to examine trends in longitudinal workload data. Following multiple imputation, a Pearson correlation evaluated the longitudinal association between jump count and sRPE over the season. A single imputation method based on the specific context of the session for which data are missing (team mean) was only outperformed by methods that combine information about the session and the individual (machine learning models). There was a significant and strong association between jump count and sRPE in the original data and imputed datasets using multiple imputation. The amount and nature of the missing data should be considered when choosing a method for single imputation of workload data in youth basketball. Multiple imputation using several predictor variables in a regression model can be used for analyses where workload is accumulated across an entire season.


Mathematics ◽  
2021 ◽  
Vol 9 (24) ◽  
pp. 3252
Author(s):  
Encarnación Álvarez-Verdejo ◽  
Pablo J. Moya-Fernández ◽  
Juan F. Muñoz-Rosas

The problem of missing data is a common feature in any study, and a single imputation method is often applied to deal with this problem. The first contribution of this paper is to analyse the empirical performance of some traditional single imputation methods when they are applied to the estimation of the Gini index, a popular measure of inequality used in many studies. Various methods for constructing confidence intervals for the Gini index are also empirically evaluated. We consider several empirical measures to analyse the performance of estimators and confidence intervals, allowing us to quantify the magnitude of the non-response bias problem. We find extremely large biases under certain non-response mechanisms, and this problem gets noticeably worse as the proportion of missing data increases. For a large correlation coefficient between the target and auxiliary variables, the regression imputation method may notably mitigate this bias problem, yielding appropriate mean square errors. We also find that confidence intervals have poor coverage rates when the probability of data being missing is not uniform, and that the regression imputation method substantially improves the handling of this problem as the correlation coefficient increases.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 519.1-519
Author(s):  
A. Alsaber ◽  
A. Al-Herz ◽  
J. Pan ◽  
K. Saleh ◽  
A. Al-Awadhi ◽  
...  

Background:Missing data in clinical epidemiological researches violate the intention to treat principle,reduce statistical power and can induce bias if they are related to patient’s response to treatment. In multiple imputation (MI), covariates are included in the imputation equation to predict the values of missing data.Objectives:To find the best approach to estimate and impute the missing values in Kuwait Registry for Rheumatic Diseases (KRRD) patients data.Methods:A number of methods were implemented for dealing with missing data. These includedMultivariate imputation by chained equations(MICE),K-Nearest Neighbors(KNN),Bayesian Principal Component Analysis(BPCA),EM with Bootstrapping(Amelia II),Sequential Random Forest(MissForest) and mean imputation. Choosing the best imputation method wasjudged by the minimum scores ofRoot Mean Square Error(RMSE),Mean Absolute Error(MAE) andKolmogorov–Smirnov D test statistic(KS) between the imputed datapoints and the original datapoints that were subsequently sat to missing.Results:A total of 1,685 rheumatoid arthritis (RA) patients and 10,613 hospital visits were included in the registry. Among them, we found a number of variables that had missing values exceeding 5% of the total values. These included duration of RA (13.0%), smoking history (26.3%), rheumatoid factor (7.93%), anti-citrullinated peptide antibodies (20.5%), anti-nuclear antibodies (20.4%), sicca symptoms (19.2%), family history of a rheumatic disease (28.5%), steroid therapy (5.94%), ESR (5.16%), CRP (22.9%) and SDAI (38.0%), The results showed that among the methods used, MissForest gave the highest level of accuracy to estimate the missing values. It had the least imputation errors for both continuous and categorical variables at each frequency of missingness and it had the smallest prediction differences when the models used imputed laboratory values. In both data sets, MICE had the second least imputation errors and prediction differences, followed by KNN and mean imputation.Conclusion:MissForest is a highly accurate method of imputation for missing data in KRRD and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in clinical predictive models. This approach can be used in registries to improve the accuracy of data, including the ones for rheumatoid arthritis patients.References:[1]Junninen, H.; Niska, H.; Tuppurainen, K.; Ruuskanen, J.; Kolehmainen, M. Methods for imputation ofmissing values in air quality data sets.Atmospheric Environment2004,38, 2895–2907.[2]Norazian, M.N.; Shukri, Y.A.; Azam, R.N.; Al Bakri, A.M.M. Estimation of missing values in air pollutiondata using single imputation techniques.ScienceAsia2008,34, 341–345.[3]Plaia, A.; Bondi, A. Single imputation method of missing values in environmental pollution data sets.Atmospheric Environment2006,40, 7316–7330.[4]Kabir, G.; Tesfamariam, S.; Hemsing, J.; Sadiq, R. Handling incomplete and missing data in water networkdatabase using imputation methods.Sustainable and Resilient Infrastructure2019, pp. 1–13.[5]Di Zio, M.; Guarnera, U.; Luzi, O. Imputation through finite Gaussian mixture models.ComputationalStatistics & Data Analysis2007,51, 5305–5316.Disclosure of Interests:None declared


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Sign in / Sign up

Export Citation Format

Share Document