imputation model
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 30)

H-INDEX

6
(FIVE YEARS 1)

Author(s):  
C. V. S. R. Syavasya ◽  
M. A. Lakshmi

With the rapid explosion of the data streams from the applications, ensuring accurate data analysis is essential for effective real-time decision making. Nowadays, data stream applications often confront the missing values that affect the performance of the classification models. Several imputation models have adopted the deep learning algorithms for estimating the missing values; however, the lack of parameter and structure tuning in classification, degrade the performance for data imputation. This work presents the missing data imputation model using the adaptive deep incremental learning algorithm for streaming applications. The proposed approach incorporates two main processes: enhancing the deep incremental learning algorithm and enhancing deep incremental learning-based imputation. Initially, the proposed approach focuses on tuning the learning rate with both the Adaptive Moment Estimation (Adam) along with Stochastic Gradient Descent (SGD) optimizers and tuning the hidden neurons. Secondly, the proposed approach applies the enhanced deep incremental learning algorithm to estimate the imputed values in two steps: (i) imputation process to predict the missing values based on the temporal-proximity and (ii) generation of complete IoT dataset by imputing the missing values from both the predicted values. The experimental outcomes illustrate that the proposed imputation model effectively transforms the incomplete dataset into a complete dataset with minimal error.


2021 ◽  
Vol 11 (23) ◽  
pp. 11491
Author(s):  
Laura Sofía Hoyos-Gomez ◽  
Belizza Janet Ruiz-Mendoza

Solar irradiance is an available resource that could support electrification in regions that are low on socio-economic indices. Therefore, it is increasingly important to understand the behavior of solar irradiance. and data on solar irradiance. Some locations, especially those with a low socio-economic population, do not have measured solar irradiance data, and if such information exists, it is not complete. There are different approaches for estimating solar irradiance, from learning models to empirical models. The latter has the advantage of low computational costs, allowing its wide use. Researchers estimate solar energy resources using information from other meteorological variables, such as temperature. However, there is no broad analysis of these techniques in tropical and mountainous environments. Therefore, in order to address this gap, our research analyzes the performance of three well-known empirical temperature-based models—Hargreaves and Samani, Bristol and Campbell, and Okundamiya and Nzeako—and proposes a new one for tropical and mountainous environments. The new empirical technique models daily solar irradiance in some areas better than the other three models. Statistical error comparison allows us to select the best model for each location and determines the data imputation model. Hargreaves and Samani’s model had better results in the Pacific zone with an average RMSE of 936,195 Wh/m2 day, SD of 36,01%, MAE of 748,435 Wh/m2 day, and U95 of 1.836,325 Wh/m2 day. The new proposed model showed better results in the Andean and Amazon zones with an average RMSE of 1.032,99 Wh/m2 day, SD of 34,455 Wh/m2 day, MAE of 825,46 Wh/m2 day, and U95 of 2.025,84 Wh/m2 day. Another result was the linear relationship between the new empirical model constants and the altitude of 2500 MASL (mean above sea level).


2021 ◽  
Author(s):  
◽  
Maoxin Luo

<p>The Food Nutrition Environment Survey (FNES) is a survey of New Zealand early childhood centres and schools and the food and nutritional services that they provide for their pupils. The 2007 and 2009 FNES surveys were managed by the Ministry of Health. Like all the other social surveys, the FNES has the common problem of unit and item non-responses. In other words, the FNES has missing data. In this thesis, we have surveyed a wide variety of missing data handling techniques and applied most of them to the FNES datasets. This thesis can be roughly divided into two parts. In the first part, we have studied and investigated the different nature of missing data (i.e. missing data mechanisms), and all the common and popular imputation methods, using the Synthetic Unit Record File (SURF) which has been developed by the Statistics New Zealand for educational purposes. By comparing all those different imputation methods, Bayesian Multiple Imputation (MI) method is the preferred option to impute missing data in terms of reducing non-response bias and properly propagating imputation uncertainty. Due to the overlaps in the samples selected for the 2007 and 2009 FNES surveys, we have discovered that the Bayesian MI can be improved by incorporating the matched dataset. Hence, we have proposed a couple of new approaches to utilize the extra information from the matched dataset. We believe that adapting the Bayesian MI to use the extra information from the matched dataset is a preferable imputation strategy for imputing the FNES missing data. This is because the use of the matched dataset provides more prediction power to the imputation model.</p>


2021 ◽  
Author(s):  
◽  
Maoxin Luo

<p>The Food Nutrition Environment Survey (FNES) is a survey of New Zealand early childhood centres and schools and the food and nutritional services that they provide for their pupils. The 2007 and 2009 FNES surveys were managed by the Ministry of Health. Like all the other social surveys, the FNES has the common problem of unit and item non-responses. In other words, the FNES has missing data. In this thesis, we have surveyed a wide variety of missing data handling techniques and applied most of them to the FNES datasets. This thesis can be roughly divided into two parts. In the first part, we have studied and investigated the different nature of missing data (i.e. missing data mechanisms), and all the common and popular imputation methods, using the Synthetic Unit Record File (SURF) which has been developed by the Statistics New Zealand for educational purposes. By comparing all those different imputation methods, Bayesian Multiple Imputation (MI) method is the preferred option to impute missing data in terms of reducing non-response bias and properly propagating imputation uncertainty. Due to the overlaps in the samples selected for the 2007 and 2009 FNES surveys, we have discovered that the Bayesian MI can be improved by incorporating the matched dataset. Hence, we have proposed a couple of new approaches to utilize the extra information from the matched dataset. We believe that adapting the Bayesian MI to use the extra information from the matched dataset is a preferable imputation strategy for imputing the FNES missing data. This is because the use of the matched dataset provides more prediction power to the imputation model.</p>


2021 ◽  
Author(s):  
Sara Javadi ◽  
Abbas Bahrampour ◽  
Mohammad Mehdi Saber ◽  
Mohammad Reza Baneshi

Abstract Background: Among the new multiple imputation methods, Multiple Imputation by Chained ‎Equations (MICE) is a ‎popular ‎approach for implementing multiple imputations because of its ‎flexibility. Our main focus in this study ‎is to ‎compare the performance of parametric ‎imputation models based on predictive mean matching and ‎recursive partitioning methods ‎in multiple imputation by chained equations in the ‎presence of interaction in the ‎data.Methods: We compared the performance of parametric and tree-based imputation methods via simulation using two data generation models. For each combination of data generation model and imputation method, the following steps were performed: data generation, removal of observations, imputation, logistic regression analysis, and calculation of bias, Coverage Probability (CP), and Confidence Interval (CI) width for each coefficient Furthermore, model-based and empirical SE, and estimated proportion of the variance attributable to the missing data (λ) were calculated.Results: ‎We have shown by simulation that to impute a binary response in ‎observations involving an ‎interaction, manually interring the interaction term into the imputation model in the ‎predictive mean matching ‎model improves the performance of the PMM method compared to the recursive partitioning models in ‎ ‎multiple imputation by chained equations.‎ The parametric method in which we entered the interaction model into the imputation model (MICE-‎‎‎Interaction) led to smaller bias, slightly higher coverage probability for the interaction effect, but it ‎had ‎slightly ‎wider confidence intervals than tree-based imputation (especially classification and ‎regression ‎trees). Conclusions: The application of MICE-Interaction led to better performance than ‎recursive ‎partitioning methods in MICE, although ‎the user is interested in estimating the interaction and does not ‎know ‎enough about the structure of the observations, recursive partitioning methods can be ‎suggested to impute ‎the ‎missing values.


2021 ◽  
pp. 096228022110473
Author(s):  
Lauren J Beesley ◽  
Irina Bondarenko ◽  
Michael R Elliot ◽  
Allison W Kurian ◽  
Steven J Katz ◽  
...  

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation, also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the sequential regression multiple imputation imputation procedure to handle missingness not at random in the setting where missingness may depend on other variables that are also missing but not on the missing variable itself, conditioning on fully observed variables. We provide algebraic justification for several generalizations of standard sequential regression multiple imputation using Taylor series and other approximations of the target imputation distribution under missingness not at random. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the missingness not at random missingness model and observed data. In a simulation study, we demonstrate that the proposed sequential regression multiple imputation modifications result in reduced bias in the final analysis compared to standard sequential regression multiple imputation, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant.


2021 ◽  
Author(s):  
Melissa Middleton ◽  
Cattram Nguyen ◽  
Margarita Moreno-Betancur ◽  
John B Carlin ◽  
Katherine J Lee

Abstract Background In case-cohort studies a random subcohort is selected from the inception cohort and acts as the sample of controls for several outcome investigations. Analysis is conducted using only the cases and the subcohort, with inverse probability weighting (IPW) used to account for the unequal sampling probabilities resulting from the study design. Like all epidemiological studies, case-cohort studies are susceptible to missing data. Multiple imputation (MI) has become increasingly popular for addressing missing data in epidemiological studies. It is currently unclear how best to incorporate the weights from a case-cohort analysis in MI procedures used to address missing covariate data.Method A simulation study was conducted with missingness in two covariates, motivated by a case study within the Barwon Infant Study. MI methods considered were: using the outcome, a proxy for weights in the simple case-cohort design considered, as a predictor in the imputation model, with and without exposure and covariate interactions; imputing separately within each weight category; and using a weighted imputation model. These methods were compared to a complete case analysis (CCA) within the context of a standard IPW analysis model estimating either the risk or odds ratio. The strength of associations, missing data mechanism, proportion of observations with incomplete covariate data, and subcohort selection probability varied across the simulation scenarios. Methods were also applied to the case study.Results There was similar performance in terms of relative bias and precision with all MI methods across the scenarios considered, with expected improvements compared with the CCA. Slight underestimation of the standard error was seen throughout but the nominal level of coverage (95%) was generally achieved. All MI methods showed a similar increase in precision as the subcohort selection probability increased, irrespective of the scenario. A similar pattern of results was seen in the case study.Conclusions How weights were incorporated into the imputation model had minimal effect on the performance of MI; this may be due to case-cohort studies only having two weight categories. In this context, inclusion of the outcome in the imputation model was sufficient to account for the unequal sampling probabilities in the analysis model.


2021 ◽  
Vol 17 (4) ◽  
pp. 48-66
Author(s):  
Han Li ◽  
Zhao Liu ◽  
Ping Zhu

The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document