scholarly journals Handling Missing Responses in Psychometrics: Methods and Software

Psych ◽  
2021 ◽  
Vol 3 (4) ◽  
pp. 673-693
Author(s):  
Shenghai Dai

The presence of missing responses in assessment settings is inevitable and may yield biased parameter estimates in psychometric modeling if ignored or handled improperly. Many methods have been proposed to handle missing responses in assessment data that are often dichotomous or polytomous. Their applications remain nominal, however, partly due to that (1) there is no sufficient support in the literature for an optimal method; (2) many practitioners and researchers are not familiar with these methods; and (3) these methods are usually not employed by psychometric software and missing responses need to be handled separately. This article introduces and reviews the commonly used missing response handling methods in psychometrics, along with the literature that examines and compares the performance of these methods. Further, the use of the TestDataImputation package in R is introduced and illustrated with an example data set and a simulation study. Corresponding R codes are provided.

2019 ◽  
Author(s):  
Leili Tapak ◽  
Omid Hamidi ◽  
Majid Sadeghifar ◽  
Hassan Doosti ◽  
Ghobad Moradi

Abstract Objectives Zero-inflated proportion or rate data nested in clusters due to the sampling structure can be found in many disciplines. Sometimes, the rate response may not be observed for some study units because of some limitations (false negative) like failure in recording data and the zeros are observed instead of the actual value of the rate/proportions (low incidence). In this study, we proposed a multilevel zero-inflated censored Beta regression model that can address zero-inflation rate data with low incidence.Methods We assumed that the random effects are independent and normally distributed. The performance of the proposed approach was evaluated by application on a three level real data set and a simulation study. We applied the proposed model to analyze brucellosis diagnosis rate data and investigate the effects of climatic and geographical position. For comparison, we also applied the standard zero-inflated censored Beta regression model that does not account for correlation.Results Results showed the proposed model performed better than zero-inflated censored Beta based on AIC criterion. Height (p-value <0.0001), temperature (p-value <0.0001) and precipitation (p-value = 0.0006) significantly affected brucellosis rates. While, precipitation in ZICBETA model was not statistically significant (p-value =0.385). Simulation study also showed that the estimations obtained by maximum likelihood approach had reasonable in terms of mean square error.Conclusions The results showed that the proposed method can capture the correlations in the real data set and yields accurate parameter estimates.


2020 ◽  
Vol 80 (5) ◽  
pp. 932-954 ◽  
Author(s):  
Jiaying Xiao ◽  
Okan Bulut

Large amounts of missing data could distort item parameter estimation and lead to biased ability estimates in educational assessments. Therefore, missing responses should be handled properly before estimating any parameters. In this study, two Monte Carlo simulation studies were conducted to compare the performance of four methods in handling missing data when estimating ability parameters. The methods were full-information maximum likelihood (FIML), zero replacement, and multiple imputation with chain equations utilizing classification and regression trees (MICE-CART) and random forest imputation (MICE-RFI). For the two imputation methods, missing responses were considered as a valid response category to enhance the accuracy of imputations. Bias, root mean square error, and the correlation between true ability parameters and estimated ability parameters were used to evaluate the accuracy of ability estimates for each method. Results indicated that FIML outperformed the other methods under most conditions. Zero replacement yielded accurate ability estimates when missing proportions were very high. The performances of MICE-CART and MICE-RFI were quite similar but these two methods appeared to be affected differently by the missing data mechanism. As the number of items increased and missing proportions decreased, all the methods performed better. In addition, the information on missing data could improve the performance of MICE-RFI and MICE-CART when the data set is sparse and the missing data mechanism is missing at random.


2021 ◽  
Vol 45 (3) ◽  
pp. 159-177
Author(s):  
Chen-Wei Liu

Missing not at random (MNAR) modeling for non-ignorable missing responses usually assumes that the latent variable distribution is a bivariate normal distribution. Such an assumption is rarely verified and often employed as a standard in practice. Recent studies for “complete” item responses (i.e., no missing data) have shown that ignoring the nonnormal distribution of a unidimensional latent variable, especially skewed or bimodal, can yield biased estimates and misleading conclusion. However, dealing with the bivariate nonnormal latent variable distribution with present MNAR data has not been looked into. This article proposes to extend unidimensional empirical histogram and Davidian curve methods to simultaneously deal with nonnormal latent variable distribution and MNAR data. A simulation study is carried out to demonstrate the consequence of ignoring bivariate nonnormal distribution on parameter estimates, followed by an empirical analysis of “don’t know” item responses. The results presented in this article show that examining the assumption of bivariate nonnormal latent variable distribution should be considered as a routine for MNAR data to minimize the impact of nonnormality on parameter estimates.


2010 ◽  
Vol 14 (3) ◽  
pp. 545-556 ◽  
Author(s):  
J. Rings ◽  
J. A. Huisman ◽  
H. Vereecken

Abstract. Coupled hydrogeophysical methods infer hydrological and petrophysical parameters directly from geophysical measurements. Widespread methods do not explicitly recognize uncertainty in parameter estimates. Therefore, we apply a sequential Bayesian framework that provides updates of state, parameters and their uncertainty whenever measurements become available. We have coupled a hydrological and an electrical resistivity tomography (ERT) forward code in a particle filtering framework. First, we analyze a synthetic data set of lysimeter infiltration monitored with ERT. In a second step, we apply the approach to field data measured during an infiltration event on a full-scale dike model. For the synthetic data, the water content distribution and the hydraulic conductivity are accurately estimated after a few time steps. For the field data, hydraulic parameters are successfully estimated from water content measurements made with spatial time domain reflectometry and ERT, and the development of their posterior distributions is shown.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10681
Author(s):  
Jake Dickinson ◽  
Marcel de Matas ◽  
Paul A. Dickinson ◽  
Hitesh B. Mistry

Purpose To assess whether a model-based analysis increased statistical power over an analysis of final day volumes and provide insights into more efficient patient derived xenograft (PDX) study designs. Methods Tumour xenograft time-series data was extracted from a public PDX drug treatment database. For all 2-arm studies the percent tumour growth inhibition (TGI) at day 14, 21 and 28 was calculated. Treatment effect was analysed using an un-paired, two-tailed t-test (empirical) and a model-based analysis, likelihood ratio-test (LRT). In addition, a simulation study was performed to assess the difference in power between the two data-analysis approaches for PDX or standard cell-line derived xenografts (CDX). Results The model-based analysis had greater statistical power than the empirical approach within the PDX data-set. The model-based approach was able to detect TGI values as low as 25% whereas the empirical approach required at least 50% TGI. The simulation study confirmed the findings and highlighted that CDX studies require fewer animals than PDX studies which show the equivalent level of TGI. Conclusions The study conducted adds to the growing literature which has shown that a model-based analysis of xenograft data improves statistical power over the common empirical approach. The analysis conducted showed that a model-based approach, based on the first mathematical model of tumour growth, was able to detect smaller size of effect compared to the empirical approach which is common of such studies. A model-based analysis should allow studies to reduce animal use and experiment length providing effective insights into compound anti-tumour activity.


2016 ◽  
Author(s):  
Rui J. Costa ◽  
Hilde Wilkinson-Herbots

AbstractThe isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, Becquet and Przeworski (2009) report that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions (including the assumption of constant gene flow until the present). This paper is concerned with the isolation-with-initial-migration (IIM) model of Wilkinson-Herbots (2012), which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used to distinguish between alternative models representing different evolutionary scenarios, by means of likelihood ratio tests. We illustrate the procedure on pairs of Drosophila sequences from approximately 30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this paper.


Geophysics ◽  
2020 ◽  
Vol 85 (3) ◽  
pp. R163-R175
Author(s):  
Huaizhen Chen ◽  
Junxiao Li ◽  
Kristopher A. Innanen

Based on a model of attenuative cracked rock, we have derived a simplified and frequency-dependent stiffness matrix associated with (1) a rock volume containing aligned and partially saturated cracks and (2) a new indicator of oil-bearing fractured reservoirs, which is related to pressure relaxation in cracked rocks and influenced by fluid viscosity and saturation. Starting from the mathematical form of a perturbation in this stiffness matrix across a reflecting interface separating two attenuative cracked media, we set up a linearized P-wave to P-wave reflection coefficient as an azimuthally and frequency-dependent function of dry rock elastic properties, dry fracture weaknesses, and the new indicator. By varying this reflection coefficient with azimuthal angle, we derive a further expression referred to as the quasidifference in elastic impedance, or [Formula: see text], which is primarily affected by the dry fracture weaknesses and the new indicator. An inversion approach is established to use differences in frequency components of seismic amplitudes to estimate these weaknesses and the indicator based on the derived [Formula: see text]. In synthetic inversion tests, we determine that the approach produces interpretable parameter estimates in the presence of data with a moderate signal-to-noise ratio (S/N). Testing on a real data set suggests that reliable fracture weakness and indicator are generated by the approach; fractured and oil-bearing reservoirs are identified through a combination of the dry fracture weakness and the new indicator.


2020 ◽  
Vol 44 (6) ◽  
pp. 431-446 ◽  
Author(s):  
Pablo Nájera ◽  
Miguel A. Sorrel ◽  
Jimmy de la Torre ◽  
Francisco José Abad

In the context of cognitive diagnosis models (CDMs), a Q-matrix reflects the correspondence between attributes and items. The Q-matrix construction process is typically subjective in nature, which may lead to misspecifications. All this can negatively affect the attribute classification accuracy. In response, several methods of empirical Q-matrix validation have been developed. The general discrimination index (GDI) method has some relevant advantages such as the possibility of being applied to several CDMs. However, the estimation of the GDI relies on the estimation of the latent group sizes and success probabilities, which is made with the original (possibly misspecified) Q-matrix. This can be a problem, especially in those situations in which there is a great uncertainty about the Q-matrix specification. To address this, the present study investigates the iterative application of the GDI method, where only one item is modified at each step of the iterative procedure, and the required cutoff is updated considering the new parameter estimates. A simulation study was conducted to test the performance of the new procedure. Results showed that the performance of the GDI method improved when the application was iterative at the item level and an appropriate cutoff point was used. This was most notable when the original Q-matrix misspecification rate was high, where the proposed procedure performed better 96.5% of the times. The results are illustrated using Tatsuoka’s fraction-subtraction data set.


2013 ◽  
Vol 19 (3) ◽  
pp. 344-353 ◽  
Author(s):  
Keith R. Shockley

Quantitative high-throughput screening (qHTS) experiments can simultaneously produce concentration-response profiles for thousands of chemicals. In a typical qHTS study, a large chemical library is subjected to a primary screen to identify candidate hits for secondary screening, validation studies, or prediction modeling. Different algorithms, usually based on the Hill equation logistic model, have been used to classify compounds as active or inactive (or inconclusive). However, observed concentration-response activity relationships may not adequately fit a sigmoidal curve. Furthermore, it is unclear how to prioritize chemicals for follow-up studies given the large uncertainties that often accompany parameter estimates from nonlinear models. Weighted Shannon entropy can address these concerns by ranking compounds according to profile-specific statistics derived from estimates of the probability mass distribution of response at the tested concentration levels. This strategy can be used to rank all tested chemicals in the absence of a prespecified model structure, or the approach can complement existing activity call algorithms by ranking the returned candidate hits. The weighted entropy approach was evaluated here using data simulated from the Hill equation model. The procedure was then applied to a chemical genomics profiling data set interrogating compounds for androgen receptor agonist activity.


Mathematics ◽  
2020 ◽  
Vol 8 (10) ◽  
pp. 1786 ◽  
Author(s):  
A. M. Abd El-Raheem ◽  
M. H. Abu-Moussa ◽  
Marwa M. Mohie El-Din ◽  
E. H. Hafez

In this article, a progressive-stress accelerated life test (ALT) that is based on progressive type-II censoring is studied. The cumulative exposure model is used when the lifetime of test units follows Pareto-IV distribution. Different estimates as the maximum likelihood estimates (MLEs) and Bayes estimates (BEs) for the model parameters are discussed. Bayesian estimates are derived while using the Tierney and Kadane (TK) approximation method and the importance sampling method. The asymptotic and bootstrap confidence intervals (CIs) of the parameters are constructed. A real data set is analyzed in order to clarify the methods proposed through this paper. Two types of the progressive-stress tests, the simple ramp-stress test and multiple ramp-stress test, are compared through the simulation study. Finally, some interesting conclusions are drawn.


Sign in / Sign up

Export Citation Format

Share Document