scholarly journals MIGHT: Statistical Methodology for Missing-Data Imputation in Food Composition Databases

2019 ◽  
Vol 9 (19) ◽  
pp. 4111 ◽  
Author(s):  
Gordana Ispirova ◽  
Tome Eftimov ◽  
Peter Korošec ◽  
Barbara Koroušić Seljak

This paper addresses the problem of missing data in food composition databases (FCDBs). The missing data can be either for selected foods or for specific components only. Most often, the problem is solved by human experts subjectively borrowing data from other FCDBs, for data estimation or imputation. Such an approach is not only time-consuming but may also lead to wrong decisions as the value of certain components in certain foods may vary from database to database due to differences in analytical methods. To ease missing-data borrowing and increase the quality of missing-data selection, we propose a new computer-based methodology, named MIGHT - Missing Nutrient Value Imputation UsinG Null Hypothesis Testing, that enables optimal selection of missing data from different FCDBs. The evaluation on a subset of European FCDBs, available through EuroFIR and complied with the Food data structure and format standard BS EN 16104 published in 2012, proves that, in more than 80% of selected cases, MIGHT gives more accurate results than techniques currently applied for missing value imputation in FCDBs. MIGHT deals with missing data in FCDBs by introducing rules for missing data imputation based on the idea that proper statistical analysis can decrease the error of data borrowing.

2019 ◽  
Vol 27 (2) ◽  
pp. 313-334
Author(s):  
Canhua Xiao ◽  
Deborah W. Bruner ◽  
Tian Dai ◽  
Ying Guo ◽  
Alexandra Hanlon

Background and PurposeTo compare the effects of missing-data imputation techniques, mean imputation, group mean imputation, regression imputation, and multiple imputation (MI), on the results of exploratory factor analysis under different missing assumptions.MethodsMissing data with different missing assumptions were generated from true data. The quality of imputed data was examined by correlation coefficients. Factor structures were compared indirectly by coefficients of congruence and directly by factor structures.ResultsMI had the best quality and matching factor structure to the true data for all missing assumptions with different missing rates. Mean imputation had the least favorable results in factor analysis. The imputation techniques revealed no important differences with 10% of data missing.ConclusionMI showed the best results, especially with larger proportions of missing data.


Author(s):  
Tshilidzi Marwala

In this chapter, the traditional missing data imputation issues such as missing data patterns and mechanisms are described. Attention is paid to the best models to deal with particular missing data mechanisms. A review of traditional missing data imputation methods, namely case deletion and prediction rules, is conducted. For case deletion, list-wise and pair-wise deletions are reviewed. In addition, for prediction rules, the imputation techniques such as mean substitution, hot-deck, regression and decision trees are also reviewed. Two missing data examples are studied, namely: the Sudoku puzzle and a mechanical system. The major conclusions drawn from these examples are that there is a need for an accurate model that describes inter-relationships and rules that define the data and that a good optimization method is required for a successful missing data estimation procedure.


2009 ◽  
Vol 28 (2) ◽  
pp. S270
Author(s):  
E.C. Wang ◽  
K.L. Grady ◽  
B. Rybarczyk ◽  
D.C. Naftel ◽  
S. Myers ◽  
...  

2021 ◽  
pp. 147592172110219
Author(s):  
Huachen Jiang ◽  
Chunfeng Wan ◽  
Kang Yang ◽  
Youliang Ding ◽  
Songtao Xue

Wireless sensors are the key components of structural health monitoring systems. During the signal transmission, sensor failure is inevitable, among which, data loss is the most common type. Missing data problem poses a huge challenge to the consequent damage detection and condition assessment, and therefore, great importance should be attached. Conventional missing data imputation basically adopts the correlation-based method, especially for strain monitoring data. However, such methods often require delicate model selection, and the correlations for vehicle-induced strains are much harder to be captured compared with temperature-induced strains. In this article, a novel data-driven generative adversarial network (GAN) for imputing missing strain response is proposed. As opposed to traditional ways where correlations for inter-strains are explicitly modeled, the proposed method directly imputes the missing data considering the spatial–temporal relationships with other strain sensors based on the remaining observed data. Furthermore, the intact and complete dataset is not even necessary during the training process, which shows another great superiority over the model-based imputation method. The proposed method is implemented and verified on a real concrete bridge. In order to demonstrate the applicability and robustness of the GAN, imputation for single and multiple sensors is studied. Results show the proposed method provides an excellent performance of imputation accuracy and efficiency.


Sign in / Sign up

Export Citation Format

Share Document