Evaluating FIML And Multiple Imputation In Joint Ordinal-Continuous Measurements Models With Missing Data

Mapping Intimacies ◽

10.31234/osf.io/j3b2t ◽

2021 ◽

Author(s):

Aaron Lim ◽

Mike W.-L. Cheung

Keyword(s):

Missing Data ◽

Least Squares ◽

Multiple Imputation ◽

Latent Variable ◽

Weighted Least Squares ◽

Low Frequencies ◽

Full Information Maximum Likelihood ◽

Fully Conditional Specification ◽

Conditional Specification ◽

Almost All

Missing data is a common occurrence in confirmatory factor analysis (CFA). Much work had evaluated the performance of different techniques when all observed variables were either continuous or ordinal. However, few have investigated these techniques when observed variables are a mix of continuous and ordinal variables. This study investigated the performance of four approaches to handling missing data in these models, a joint ordinal-continuous full information maximum likelihood (JOC-FIML) approach and three multiple imputation approaches (fully conditional specification, fully conditional specification with latent variable formulation, and expectation-maximization with bootstrapping) combined with the weighted least squares with mean and variance adjustment (WLSMV) estimator. In a Monte-Carlo simulation, the JOC-FIML approach produced unbiased estimations of factor loadings and standard errors in almost all conditions. Fully conditional specification combined with WLSMV was second best, producing accurate estimates if the sample size was large. We recommend JOC-FIML across most conditions, except when certain ordinal categories have extremely low frequencies as it was less likely to converge. If the sample is large, fully conditional specification combined with weighted-least-squares is recommended when the FIML approach is not feasible (e.g., non-convergence, variables that predict missingness are not of interest to the analysis).

Download Full-text

Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study

International Journal of Statistics in Medical Research ◽

10.6000/1929-6029.2015.04.03.7 ◽

2015 ◽

Vol 4 (3) ◽

pp. 287-295 ◽

Cited By ~ 105

Author(s):

Yang Liu ◽

◽

Anindya De

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Epidemiologic Study ◽

Fully Conditional Specification ◽

Conditional Specification

Download Full-text

Multiple Imputation for Multivariate Missing Data: The Fully Conditional Specification Approach

10.1201/9780429156397-7 ◽

2021 ◽

pp. 181-208

Author(s):

Yulei He ◽

Guangyu Zhang ◽

Chiu-Hsieh Hsu

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Fully Conditional Specification ◽

Conditional Specification

Download Full-text

Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation

American Journal of Epidemiology ◽

10.1093/aje/kwp425 ◽

2010 ◽

Vol 171 (5) ◽

pp. 624-632 ◽

Cited By ~ 346

Author(s):

K. J. Lee ◽

J. B. Carlin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multivariate Normal ◽

Fully Conditional Specification ◽

Conditional Specification ◽

Multivariate Normal Imputation

Download Full-text

Dealing with missing information on covariates for excess mortality hazard regression models – Making the imputation model compatible with the substantive model

Statistical Methods in Medical Research ◽

10.1177/09622802211031615 ◽

2021 ◽

Vol 30 (10) ◽

pp. 2256-2268

Author(s):

Luís Antunes ◽

Denisa Mendonça ◽

Maria José Bento ◽

Edmund Njeru Njagi ◽

Aurélien Belot ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Survival Data ◽

Regression Models ◽

Cancer Survival ◽

Population Based ◽

Hazard Regression ◽

The North ◽

Fully Conditional Specification ◽

Conditional Specification

Missing data is a common issue in epidemiological databases. Among the different ways of dealing with missing data, multiple imputation has become more available in common statistical software packages. However, the incompatibility between the imputation and substantive model, which can arise when the associations between variables in the substantive model are not taken into account in the imputation models or when the substantive model is itself nonlinear, can lead to invalid inference. Aiming at analysing population-based cancer survival data, we extended the multiple imputation substantive model compatible-fully conditional specification (SMC-FCS) approach, proposed by Bartlett et al. in 2015 to accommodate excess hazard regression models. The proposed approach was compared with the standard fully conditional specification multiple imputation procedure and with the complete-case analysis using a simulation study. The SMC-FCS approach produced unbiased estimates in both scenarios tested, while the fully conditional specification produced biased estimates and poor empirical coverages probabilities. The SMC-FCS algorithm was then used for handling missing data in the evaluation of socioeconomic inequalities in survival from colorectal cancer patients diagnosed in the North Region of Portugal. The analysis using SMC-FCS showed a clearer trend in higher excess hazards for patients coming from more deprived areas. The proposed algorithm was implemented in R software and is presented as Supplementary Material.

Download Full-text

Best (but oft-forgotten) practices: missing data methods in randomized controlled nutrition trials

American Journal of Clinical Nutrition ◽

10.1093/ajcn/nqy271 ◽

2019 ◽

Vol 109 (3) ◽

pp. 504-508 ◽

Cited By ~ 7

Author(s):

Peng Li ◽

Elizabeth A Stuart

Keyword(s):

Missing Data ◽

Maximum Likelihood ◽

Causal Inference ◽

Randomized Controlled Trials ◽

Multiple Imputation ◽

Controlled Trials ◽

Full Information ◽

Complete Case ◽

Full Information Maximum Likelihood ◽

Randomized Controlled

ABSTRACT Missing data ubiquitously occur in randomized controlled trials and may compromise the causal inference if inappropriately handled. Some problematic missing data methods such as complete case (CC) analysis and last-observation-carried-forward (LOCF) are unfortunately still common in nutrition trials. This situation is partially caused by investigator confusion on missing data assumptions for different methods. In this statistical guidance, we provide a brief introduction of missing data mechanisms and the unreasonable assumptions that underlie CC and LOCF and recommend 2 appropriate missing data methods: multiple imputation and full information maximum likelihood.

Download Full-text

Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system

Ensaio Avaliação e Políticas Públicas em Educação ◽

10.1590/s0104-40362020002802346 ◽

2020 ◽

Vol 28 (108) ◽

pp. 599-621

Author(s):

Maria Eugénia Ferrão ◽

Paula Prata ◽

Maria Teresa Gonzaga Alves

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Educational Research ◽

Missing Values ◽

Assessment System ◽

Policy And Practice ◽

Real World Data ◽

Missing Completely At Random ◽

Almost All ◽

Identifiable Data

Abstract Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.

Download Full-text

Comparison of Selected Multiple Imputation Methods for Continuous Variables – Preliminary Simulation Study Results

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.339.05 ◽

2019 ◽

Vol 6 (339) ◽

pp. 73-98

Author(s):

Małgorzata Aleksandra Misztal

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Missing Values ◽

Imputation Accuracy ◽

Imputation Method ◽

Data Sets ◽

Continuous Variables ◽

Imputation Methods ◽

Study Results ◽

Almost All

The problem of incomplete data and its implications for drawing valid conclusions from statistical analyses is not related to any particular scientific domain, it arises in economics, sociology, education, behavioural sciences or medicine. Almost all standard statistical methods presume that every object has information on every variable to be included in the analysis and the typical approach to missing data is simply to delete them. However, this leads to ineffective and biased analysis results and is not recommended in the literature. The state of the art technique for handling missing data is multiple imputation. In the paper, some selected multiple imputation methods were taken into account. Special attention was paid to using principal components analysis (PCA) as an imputation method. The goal of the study was to assess the quality of PCA‑based imputations as compared to two other multiple imputation techniques: multivariate imputation by chained equations (MICE) and missForest. The comparison was made by artificially simulating different proportions (10–50%) and mechanisms of missing data using 10 complete data sets from the UCI repository of machine learning databases. Then, missing values were imputed with the use of MICE, missForest and the PCA‑based method (MIPCA). The normalised root mean square error (NRMSE) was calculated as a measure of imputation accuracy. On the basis of the conducted analyses, missForest can be recommended as a multiple imputation method providing the lowest rates of imputation errors for all types of missingness. PCA‑based imputation does not perform well in terms of accuracy.

Download Full-text

Application of Multiple Imputation using the Two-Fold Fully Conditional Specification Algorithm in Longitudinal Clinical Data

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1401400213 ◽

2014 ◽

Vol 14 (2) ◽

pp. 418-431 ◽

Cited By ~ 25

Author(s):

Catherine Welch ◽

Jonathan Bartlett ◽

Irene Petersen

Keyword(s):

Multiple Imputation ◽

Clinical Data ◽

Fully Conditional Specification ◽

Conditional Specification

Download Full-text

Multiple Imputation of Covariates by Substantive-model Compatible Fully Conditional Specification

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1501500206 ◽

2015 ◽

Vol 15 (2) ◽

pp. 437-456 ◽

Cited By ~ 17

Author(s):

Jonathan W. Bartlett ◽

Tim P. Morris

Keyword(s):

Multiple Imputation ◽

Fully Conditional Specification ◽

Conditional Specification

Download Full-text

Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey

Quality of Life Research ◽

10.1007/s11136-010-9740-3 ◽

2010 ◽

Vol 20 (2) ◽

pp. 287-300 ◽

Cited By ~ 138

Author(s):

Hugo Peyre ◽

Alain Leplège ◽

Joël Coste

Keyword(s):

Quality Of Life ◽

Missing Data ◽

Maximum Likelihood ◽

Multiple Imputation ◽

Health Survey ◽

Full Information ◽

Full Information Maximum Likelihood ◽

Sf 36 ◽

Missing Items

Download Full-text