scholarly journals Improving Robustness in Q-Matrix Validation Using an Iterative and Dynamic Procedure

2020 ◽  
Vol 44 (6) ◽  
pp. 431-446 ◽  
Author(s):  
Pablo Nájera ◽  
Miguel A. Sorrel ◽  
Jimmy de la Torre ◽  
Francisco José Abad

In the context of cognitive diagnosis models (CDMs), a Q-matrix reflects the correspondence between attributes and items. The Q-matrix construction process is typically subjective in nature, which may lead to misspecifications. All this can negatively affect the attribute classification accuracy. In response, several methods of empirical Q-matrix validation have been developed. The general discrimination index (GDI) method has some relevant advantages such as the possibility of being applied to several CDMs. However, the estimation of the GDI relies on the estimation of the latent group sizes and success probabilities, which is made with the original (possibly misspecified) Q-matrix. This can be a problem, especially in those situations in which there is a great uncertainty about the Q-matrix specification. To address this, the present study investigates the iterative application of the GDI method, where only one item is modified at each step of the iterative procedure, and the required cutoff is updated considering the new parameter estimates. A simulation study was conducted to test the performance of the new procedure. Results showed that the performance of the GDI method improved when the application was iterative at the item level and an appropriate cutoff point was used. This was most notable when the original Q-matrix misspecification rate was high, where the proposed procedure performed better 96.5% of the times. The results are illustrated using Tatsuoka’s fraction-subtraction data set.

2017 ◽  
Vol 41 (4) ◽  
pp. 277-293 ◽  
Author(s):  
Jinsong Chen

Q-matrix validation is of increasing concern due to the significance and subjective tendency of Q-matrix construction in the modeling process. This research proposes a residual-based approach to empirically validate Q-matrix specification based on a combination of fit measures. The approach separates Q-matrix validation into four logical steps, including the test-level evaluation, possible distinction between attribute-level and item-level misspecifications, identification of the hit item, and fit information to aid in item adjustment. Through simulation studies and real-life examples, it is shown that the misspecified items can be detected as the hit item and adjusted sequentially when the misspecification occurs at the item level or at random. Adjustment can be based on the maximum reduction of the test-level measures. When adjustment of individual items tends to be useless, attribute-level misspecification is of concern. The approach can accommodate a variety of cognitive diagnosis models (CDMs) and be extended to cover other response formats.


2019 ◽  
Vol 79 (4) ◽  
pp. 727-753 ◽  
Author(s):  
Pablo Nájera ◽  
Miguel A. Sorrel ◽  
Francisco José Abad

Cognitive diagnosis models (CDMs) are latent class multidimensional statistical models that help classify people accurately by using a set of discrete latent variables, commonly referred to as attributes. These models require a Q-matrix that indicates the attributes involved in each item. A potential problem is that the Q-matrix construction process, typically performed by domain experts, is subjective in nature. This might lead to the existence of Q-matrix misspecifications that can lead to inaccurate classifications. For this reason, several empirical Q-matrix validation methods have been developed in the recent years. de la Torre and Chiu proposed one of the most popular methods, based on a discrimination index. However, some questions related to the usefulness of the method with empirical data remained open due the restricted number of conditions examined, and the use of a unique cutoff point ( EPS) regardless of the data conditions. This article includes two simulation studies to test this validation method under a wider range of conditions, with the purpose of providing it with a higher generalization, and to empirically determine the most suitable EPS considering the data conditions. Results show a good overall performance of the method, the relevance of the different studied factors, and that using a single indiscriminate EPS is not acceptable. Specific guidelines for selecting an appropriate EPS are provided in the discussion.


2018 ◽  
Vol 44 (1) ◽  
pp. 3-24 ◽  
Author(s):  
Steven Andrew Culpepper ◽  
Yinghan Chen

Exploratory cognitive diagnosis models (CDMs) estimate the Q matrix, which is a binary matrix that indicates the attributes needed for affirmative responses to each item. Estimation of Q is an important next step for improving classifications and broadening application of CDMs. Prior research primarily focused on an exploratory version of the restrictive deterministic-input, noisy-and-gate model, and research is needed to develop exploratory methods for more flexible CDMs. We consider Bayesian methods for estimating an exploratory version of the more flexible reduced reparameterized unified model (rRUM). We show that estimating the rRUM Q matrix is complicated by a confound between elements of Q and the rRUM item parameters. A Bayesian framework is presented that accurately recovers Q using a spike–slab prior for item parameters to select the required attributes for each item. We present Monte Carlo simulation studies, demonstrating the developed algorithm improves upon prior Bayesian methods for estimating the rRUM Q matrix. We apply the developed method to the Examination for the Certificate of Proficiency in English data set. The results provide evidence of five attributes with a partially ordered attribute hierarchy.


2017 ◽  
Vol 42 (4) ◽  
pp. 405-431 ◽  
Author(s):  
Victoria Savalei ◽  
Mijke Rhemtulla

In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data—that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study.


2010 ◽  
Vol 14 (3) ◽  
pp. 545-556 ◽  
Author(s):  
J. Rings ◽  
J. A. Huisman ◽  
H. Vereecken

Abstract. Coupled hydrogeophysical methods infer hydrological and petrophysical parameters directly from geophysical measurements. Widespread methods do not explicitly recognize uncertainty in parameter estimates. Therefore, we apply a sequential Bayesian framework that provides updates of state, parameters and their uncertainty whenever measurements become available. We have coupled a hydrological and an electrical resistivity tomography (ERT) forward code in a particle filtering framework. First, we analyze a synthetic data set of lysimeter infiltration monitored with ERT. In a second step, we apply the approach to field data measured during an infiltration event on a full-scale dike model. For the synthetic data, the water content distribution and the hydraulic conductivity are accurately estimated after a few time steps. For the field data, hydraulic parameters are successfully estimated from water content measurements made with spatial time domain reflectometry and ERT, and the development of their posterior distributions is shown.


2016 ◽  
Author(s):  
Rui J. Costa ◽  
Hilde Wilkinson-Herbots

AbstractThe isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, Becquet and Przeworski (2009) report that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions (including the assumption of constant gene flow until the present). This paper is concerned with the isolation-with-initial-migration (IIM) model of Wilkinson-Herbots (2012), which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used to distinguish between alternative models representing different evolutionary scenarios, by means of likelihood ratio tests. We illustrate the procedure on pairs of Drosophila sequences from approximately 30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this paper.


2019 ◽  
Author(s):  
Leili Tapak ◽  
Omid Hamidi ◽  
Majid Sadeghifar ◽  
Hassan Doosti ◽  
Ghobad Moradi

Abstract Objectives Zero-inflated proportion or rate data nested in clusters due to the sampling structure can be found in many disciplines. Sometimes, the rate response may not be observed for some study units because of some limitations (false negative) like failure in recording data and the zeros are observed instead of the actual value of the rate/proportions (low incidence). In this study, we proposed a multilevel zero-inflated censored Beta regression model that can address zero-inflation rate data with low incidence.Methods We assumed that the random effects are independent and normally distributed. The performance of the proposed approach was evaluated by application on a three level real data set and a simulation study. We applied the proposed model to analyze brucellosis diagnosis rate data and investigate the effects of climatic and geographical position. For comparison, we also applied the standard zero-inflated censored Beta regression model that does not account for correlation.Results Results showed the proposed model performed better than zero-inflated censored Beta based on AIC criterion. Height (p-value <0.0001), temperature (p-value <0.0001) and precipitation (p-value = 0.0006) significantly affected brucellosis rates. While, precipitation in ZICBETA model was not statistically significant (p-value =0.385). Simulation study also showed that the estimations obtained by maximum likelihood approach had reasonable in terms of mean square error.Conclusions The results showed that the proposed method can capture the correlations in the real data set and yields accurate parameter estimates.


Geophysics ◽  
2020 ◽  
Vol 85 (3) ◽  
pp. R163-R175
Author(s):  
Huaizhen Chen ◽  
Junxiao Li ◽  
Kristopher A. Innanen

Based on a model of attenuative cracked rock, we have derived a simplified and frequency-dependent stiffness matrix associated with (1) a rock volume containing aligned and partially saturated cracks and (2) a new indicator of oil-bearing fractured reservoirs, which is related to pressure relaxation in cracked rocks and influenced by fluid viscosity and saturation. Starting from the mathematical form of a perturbation in this stiffness matrix across a reflecting interface separating two attenuative cracked media, we set up a linearized P-wave to P-wave reflection coefficient as an azimuthally and frequency-dependent function of dry rock elastic properties, dry fracture weaknesses, and the new indicator. By varying this reflection coefficient with azimuthal angle, we derive a further expression referred to as the quasidifference in elastic impedance, or [Formula: see text], which is primarily affected by the dry fracture weaknesses and the new indicator. An inversion approach is established to use differences in frequency components of seismic amplitudes to estimate these weaknesses and the indicator based on the derived [Formula: see text]. In synthetic inversion tests, we determine that the approach produces interpretable parameter estimates in the presence of data with a moderate signal-to-noise ratio (S/N). Testing on a real data set suggests that reliable fracture weakness and indicator are generated by the approach; fractured and oil-bearing reservoirs are identified through a combination of the dry fracture weakness and the new indicator.


1997 ◽  
Vol 165 ◽  
pp. 1-12
Author(s):  
Donald K. Yeomans

AbstractTo a significant degree, the success of spacecraft missions to comets and asteroids depends upon the accuracy of the target body ephemerides. In turn, accurate ephemerides depend upon the quality of the astrometric data set used in determining the object’s orbit and the accuracy with which the target body’s motion can be modelled. Using error analyses studies of the target bodies for the NEAR, Muses-C, Clementine 2, Stardust, and Rosetta missions, conclusions are drawn as to how to minimize target body position uncertainties at the times of encounter. In general, these uncertainties will be minimized when the object has a good number of optical observations spread over several orbital periods. If a target body lacks a lengthy data interval, its ephemeris uncertainties can be dramatically reduced with the use of radar Doppler and delay data taken when the body is relatively close to the Earth. The combination of radar and optical angle data taken at close Earth distances just before a spacecraft encounter can result in surprisingly small target body ephemeris uncertainties.


2013 ◽  
Vol 19 (3) ◽  
pp. 344-353 ◽  
Author(s):  
Keith R. Shockley

Quantitative high-throughput screening (qHTS) experiments can simultaneously produce concentration-response profiles for thousands of chemicals. In a typical qHTS study, a large chemical library is subjected to a primary screen to identify candidate hits for secondary screening, validation studies, or prediction modeling. Different algorithms, usually based on the Hill equation logistic model, have been used to classify compounds as active or inactive (or inconclusive). However, observed concentration-response activity relationships may not adequately fit a sigmoidal curve. Furthermore, it is unclear how to prioritize chemicals for follow-up studies given the large uncertainties that often accompany parameter estimates from nonlinear models. Weighted Shannon entropy can address these concerns by ranking compounds according to profile-specific statistics derived from estimates of the probability mass distribution of response at the tested concentration levels. This strategy can be used to rank all tested chemicals in the absence of a prespecified model structure, or the approach can complement existing activity call algorithms by ranking the returned candidate hits. The weighted entropy approach was evaluated here using data simulated from the Hill equation model. The procedure was then applied to a chemical genomics profiling data set interrogating compounds for androgen receptor agonist activity.


Sign in / Sign up

Export Citation Format

Share Document