Rounding non-binary categorical variables following multivariate normal imputation: evaluation of simple methods and implications for practice

Abstract Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.

Download Full-text

Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations

Brazilian Journal of Probability and Statistics ◽

10.1214/15-bjps292 ◽

2016 ◽

Vol 30 (4) ◽

pp. 521-539

Author(s):

Innocent Karangwa ◽

Danelle Kotze ◽

Renette Blignaut

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multivariate Normal ◽

Multivariate Normal Imputation

Download Full-text

Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation

American Journal of Epidemiology ◽

10.1093/aje/kwp425 ◽

2010 ◽

Vol 171 (5) ◽

pp. 624-632 ◽

Cited By ~ 346

Author(s):

K. J. Lee ◽

J. B. Carlin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multivariate Normal ◽

Fully Conditional Specification ◽

Conditional Specification ◽

Multivariate Normal Imputation

Download Full-text

Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches

Political Analysis ◽

10.1093/pan/mpu007 ◽

2014 ◽

Vol 22 (4) ◽

pp. 497-519 ◽

Cited By ~ 26

Author(s):

Jonathan Kropko ◽

Ben Goodrich ◽

Andrew Gelman ◽

Jennifer Hill

Keyword(s):

Multiple Imputation ◽

Categorical Data ◽

Missing Values ◽

Missing At Random ◽

Model Fit ◽

Categorical Variables ◽

Data Sets ◽

Multivariate Normal ◽

Evaluating Methods ◽

Election Studies

We consider the relative performance of two common approaches to multiple imputation (MI): joint multivariate normal (MVN) MI, in which the data are modeled as a sample from a joint MVN distribution; and conditional MI, in which each variable is modeled conditionally on all the others. In order to use the multivariate normal distribution, implementations of joint MVN MI typically assume that categories of discrete variables are probabilistically constructed from continuous values. We use simulations to examine the implications of these assumptions. For each approach, we assess (1) the accuracy of the imputed values; and (2) the accuracy of coefficients and fitted values from a model fit to completed data sets. These simulations consider continuous, binary, ordinal, and unordered-categorical variables. One set of simulations uses multivariate normal data, and one set uses data from the 2008 American National Election Studies. We implement a less restrictive approach than is typical when evaluating methods using simulations in the missing data literature: in each case, missing values are generated by carefully following the conditions necessary for missingness to be “missing at random” (MAR). We find that in these situations conditional MI is more accurate than joint MVN MI whenever the data include categorical variables.

Download Full-text

Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study

Statistics in Medicine ◽

10.1002/sim.5445 ◽

2012 ◽

Vol 31 (30) ◽

pp. 4164-4174 ◽

Cited By ~ 11

Author(s):

Katherine J. Lee ◽

John C. Galati ◽

Julie A. Simpson ◽

John B. Carlin

Keyword(s):

Cohort Study ◽

Ordinal Data ◽

Large Cohort ◽

Comparison Of Methods ◽

Multivariate Normal ◽

Non Linear ◽

Large Cohort Study ◽

Linear Effects ◽

Multivariate Normal Imputation

Download Full-text

On the admissibility and nonadmissibility of the usual estimator for the mean of multivariate normal population and conclusions to optimal design

Optimization ◽

10.1080/02331937408842218 ◽

1974 ◽

Vol 5 (7) ◽

pp. 591-597

Author(s):

H. Lätter

Keyword(s):

Optimal Design ◽

Normal Population ◽

Multivariate Normal ◽

Usual Estimator ◽

Multivariate Normal Population ◽

The Mean

Download Full-text

Influence of General and Crystallized Intelligence on Vocabulary Test Performance

European Journal of Psychological Assessment ◽

10.1027//1015-5759.18.1.78 ◽

2002 ◽

Vol 18 (1) ◽

pp. 78-84 ◽

Cited By ~ 10

Author(s):

Eva Ullstadius ◽

Jan-Eric Gustafsson ◽

Berit Carlstedt

Keyword(s):

Hierarchical Model ◽

Test Performance ◽

Verbal Ability ◽

Testing Procedure ◽

Intellectual Ability ◽

Analysis Of Covariance ◽

Categorical Variables ◽

Vocabulary Test ◽

Crystallized Intelligence ◽

General Ability

Summary: Vocabulary tests, part of most test batteries of general intellectual ability, measure both verbal and general ability. Newly developed techniques for confirmatory factor analysis of dichotomous variables make it possible to analyze the influence of different abilities on the performance on each item. In the testing procedure of the Computerized Swedish Enlistment test battery, eight different subtests of a new vocabulary test were given randomly to subsamples of a representative sample of 18-year-old male conscripts (N = 9001). Three central dimensions of a hierarchical model of intellectual abilities, general (G), verbal (Gc'), and spatial ability (Gv') were estimated under different assumptions of the nature of the data. In addition to an ordinary analysis of covariance matrices, assuming linearity of relations, the item variables were treated as categorical variables in the Mplus program. All eight subtests fit the hierarchical model, and the items were found to load about equally on G and Gc'. The results also indicate that if nonlinearity is not taken into account, the G loadings for the easy items are underestimated. These items, moreover, appear to be better measures of G than the difficult ones. The practical utility of the outcome for item selection and the theoretical implications for the question of the origin of verbal ability are discussed.

Download Full-text

An Alternative to Cohen's κ

European Psychologist ◽

10.1027/1016-9040.11.1.12 ◽

2006 ◽

Vol 11 (1) ◽

pp. 12-24 ◽

Cited By ~ 19

Author(s):

Alexander von Eye

Keyword(s):

Simulation Study ◽

Null Hypothesis ◽

Categorical Variables ◽

Alternative Measure ◽

Rater Agreement ◽

Verbal Processing ◽

Heavy Tailed ◽

Applicant Selection

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used. The most popular of these is Cohen's κ. In this article, a new coefficient, κ s , is proposed as an alternative measure of rater agreement. Both κ and κ s allow researchers to determine whether agreement in groups of two or more raters is significantly beyond chance. Stouffer's z is used to test the null hypothesis that κ s = 0. The coefficient κ s allows one, in addition to evaluating rater agreement in a fashion parallel to κ, to (1) examine subsets of cells in agreement tables, (2) examine cells that indicate disagreement, (3) consider alternative chance models, (4) take covariates into account, and (5) compare independent samples. Results from a simulation study are reported, which suggest that (a) the four measures of rater agreement, Cohen's κ, Brennan and Prediger's κ n , raw agreement, and κ s are sensitive to the same data characteristics when evaluating rater agreement and (b) both the z-statistic for Cohen's κ and Stouffer's z for κ s are unimodally and symmetrically distributed, but slightly heavy-tailed. Examples use data from verbal processing and applicant selection.

Download Full-text

Blood Management in Total Knee Arthroplasty: A Nationwide Analysis from 2011 to 2018

The Journal of Knee Surgery ◽

10.1055/s-0040-1721414 ◽

2020 ◽

Author(s):

Jared A. Warren ◽

John P. McLaughlin ◽

Robert M. Molloy ◽

Carlos A. Higuera ◽

Jonathan L. Schaffer ◽

...

Keyword(s):

Total Knee Arthroplasty ◽

Platelet Count ◽

Knee Arthroplasty ◽

Categorical Variables ◽

Bleeding Disorders ◽

Blood Management ◽

Improvement Program ◽

Preoperative Anemia ◽

High Incidence ◽

Total Knee

AbstractBoth advances in perioperative blood management, anesthesia, and surgical technique have improved transfusion rates following primary total knee arthroplasty (TKA), and have driven substantial change in preoperative blood ordering protocols. Therefore, blood management in TKA has seen substantial changes with the implementation of preoperative screening, patient optimization, and intra- and postoperative advances. Thus, the purpose of this study was to examine changes in blood management in primary TKA, a nationwide sample, to assess gaps and opportunities. The American College of Surgeons National Surgical Quality Improvement Program database was used to identify TKA (n = 337,160) cases from 2011 to 2018. The following variables examined, such as preoperative hematocrit (HCT), anemia (HCT <35.5% for females and <38.5% for males), platelet count, thrombocytopenia (platelet count < 150,000/µL), international normalized ration (INR), INR > 2.0, bleeding disorders, preoperative, and postoperative transfusions. Analysis of variances were used to examine changes in continuous variables, and Chi-squared tests were used for categorical variables. There was a substantial decrease in postoperative transfusions from high of 18.3% in 2011 to a low of 1.0% in 2018, (p < 0.001), as well as in preoperative anemia from a high of 13.3% in 2011 to a low of 9.5% in 2016 to 2017 (p < 0.001). There were statistically significant, but clinically irrelevant changes in the other variables examined. There was a HCT high of 41.2 in 2016 and a low of 40.4 in 2011 to 2012 (p < 0.001). There was platelet count high of 247,400 in 2018 and a low of 242,700 in 201 (p < 0.001). There was a high incidence of thrombocytopenia of 5.2% in 2017 and a low of low of 4.4% in 2018 (p < 0.001). There was a high INR of 1.037 in 2011 and a low of 1.021 in 2013 (p < 0.001). There was a high incidence of INR >2.0 of 1.0% in 2012 to 2015 and a low of 0.8% in 2016 to 2018 (p = 0.027). There was a high incidence of bleeding disorders of 2.9% in 2013 and a low of 1.8% in 2017 to 2018 (p < 0.001). There was a high incidence of preoperative transfusions of 0.1% in 2011 to 2014 and a low of <0.1% in 2015 to 2018 (p = 0.021). From 2011 to 2018, there has been substantial decreases in patients receiving postoperative transfusions after primary TKA. Similarly, although a decrease in patients with anemia was seen, there remains 1 out 10 patients with preoperative anemia, highlighting the opportunity to further improve and address this potentially modifiable risk factor before surgery. These findings may reflect changes during TKA patient selection, optimization, or management, and emphasizes the need to further advance multimodal approaches for perioperative blood management of TKA patients. This is a Level III study.

Download Full-text

Bridging the Gap between Quantitative and Qualitative Historical Research: an application of multiple regression analysis and homogeneity analysis with alternating least squares

History and Computing ◽

10.3366/hac.1996.8.3.133 ◽

1996 ◽

Vol 8 (3) ◽

pp. 133-144 ◽

Cited By ~ 1

Author(s):

María del Mar del Pozo Andrés ◽

Jacques F A Braster

Keyword(s):

Least Squares ◽

Multiple Regression ◽

Historical Research ◽

Alternating Least Squares ◽

Categorical Variables ◽

Two Dimensional ◽

Homogeneity Analysis ◽

Regression Approach ◽

Dimensional Picture ◽

Selection Of

In this article we propose two research techniques that can bridge the gap between quantitative and qualitative historical research. These are: (1) a multiple regression approach that gives information about general patterns between numerical variables and the selection of outliers for qualitative analysis; (2) a homogeneity analysis with alternating least squares that results in a two-dimensional picture in which the relationships between categorical variables are graphically presented.

Download Full-text