Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference

2021 ◽  
pp. 001316442110322
Author(s):  
Hyeon-Ah Kang ◽  
Suhwa Han ◽  
Doyoung Kim ◽  
Shu-Chuan Kao

The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c) random-effect testlet model (RTM), and (d) fixed-effect testlet model (FTM). Using data from GPCM, FTM, and RTM, we examine performance of the scoring models in multiple aspects: relative model fit, absolute item fit, significance of testlet effects, parameter recovery, and classification accuracy. The empirical analysis suggests that relative performance of the models varies substantially depending on the testlet-effect type, effect size, and trait estimator. When testlets had no or fixed effects, GPCM and FTM led to most desirable measurement outcomes. When testlets had random interaction effects, RTM demonstrated best model fit and yet showed substantially different performance in the trait recovery depending on the estimator. In particular, the advantage of RTM as a scoring model was discernable only when there existed strong random effects and the trait levels were estimated with Bayes priors. In other settings, the simpler models (i.e., GPCM, FTM) performed better or comparably. The study also revealed that polytomous scoring of testlet items has limited prospect as a functional scoring method. Based on the outcomes of the empirical evaluation, we provide practical guidelines for choosing a measurement model for polytomous innovative items that are administered in testlets.

2016 ◽  
Vol 76 (6) ◽  
pp. 976-985 ◽  
Author(s):  
Leanne M. Stanley ◽  
Michael C. Edwards

The purpose of this article is to highlight the distinction between the reliability of test scores and the fit of psychometric measurement models, reminding readers why it is important to consider both when evaluating whether test scores are valid for a proposed interpretation and/or use. It is often the case that an investigator judges both the reliability of scores and the fit of a corresponding measurement model to be either acceptable or unacceptable for a given situation, but these are not the only possible outcomes. This article focuses on situations in which model fit is deemed acceptable, but reliability is not. Data were simulated based on the item characteristics of the PROMIS (Patient Reported Outcomes Measurement Information System) anxiety item bank and analyzed using methods from classical test theory, factor analysis, and item response theory. Analytic techniques from different psychometric traditions were used to illustrate that reliability and model fit are distinct, and that disagreement among indices of reliability and model fit may provide important information bearing on a particular validity argument, independent of the data analytic techniques chosen for a particular research application. We conclude by discussing the important information gleaned from the assessment of reliability and model fit.


2005 ◽  
Vol 48 (6) ◽  
pp. 1412-1428 ◽  
Author(s):  
Patrick J. Doyle ◽  
William D. Hula ◽  
Malcolm R. McNeil ◽  
Joseph M. Mikolic ◽  
Christine Matthews

Purpose:The purposes of this investigation were to examine the construct dimensionality and range of ability effectively measured by 28 assessment items obtained from 3 different patient-reported scales of communicative functioning, and to provide a demonstration of how the Rasch approach to measurement may contribute to the definition of latent constructs and the development of instruments to measure them.Method:Item responses obtained from 421 stroke survivors with and without communication disorders were examined using the Rasch partial credit model. The dimensionality of the item pool was evaluated by (a) examining correlations of Rasch person ability scores obtained separately from each of the 3 scales, (b) iteratively excluding items exceeding mean square model fit criteria, and (c) using principal-components analysis of Rasch model residuals. The range of ability effectively measured by the item pool was examined by comparing item difficulty and category threshold calibrations to the distribution of person ability scores and by plotting the modeled standard error of person ability estimates as a function of person ability level.Results:The results indicate that most assessment items fit a unidimensional measurement model, with the notable exception of items relating to the use of written communication. The results also suggest that the range of ability that could be reliably measured by the current item pool was restricted relative to the range of ability observed in the patient sample.Conclusions:It is concluded that (a) a mature understanding of communicative functioning as a measurement construct will require further research, (b) patients with stroke-related communication disorders will be better served by the development of instruments measuring a wider range of communicative functioning ability, and (c) the theoretical and methodological tools provided by the Rasch family of measurement models may be productively applied to these efforts.


2020 ◽  
Vol 16 (3) ◽  
pp. 221-235
Author(s):  
Avinash Chandran ◽  
Loretta DiPietro ◽  
Heather Young ◽  
Angelo Elmi

AbstractIn assessments of sports-related injury severity, time loss (TL) is measured as a count of days lost to injury and analyzed using ordinal cut points. This approach ignores various athlete and event-specific factors that determine the severity of an injury. We present a conceptual framework for modeling this outcome using univariate random effects count or survival regression. Using a sample of US collegiate soccer-related injury observations, we fit random effects Poisson and Weibull Regression models to perform “severity-adjusted” evaluations of TL, and use our models to make inferences regarding the recovery process. Injury site, injury mechanism and injury history emerged as the strongest predictors in our sample. In comparing random and fixed effects models, we noted that the incorporation of the random effect attenuated associations between most observed covariates and TL, and model fit statistics revealed that the random effects models (AICPoisson = 51875.20; AICWeibull-AFT = 51113.00) improved model fit over the fixed effects models (AICPoisson = 160695.20; AICWeibull-AFT = 53179.00). Our analyses serve as a useful starting point for modeling how TL may actually occur when a player is injured, and suggest that random effects or frailty based approaches can help isolate the effect of potential determinants of TL.


Methodology ◽  
2020 ◽  
Vol 16 (3) ◽  
pp. 208-223
Author(s):  
Robert A. Peterson ◽  
Yeolib Kim ◽  
Boreum Choi

The present research examined the distributional properties of construct reliability indices and model fit metrics, explored relationships between and among the indices and metrics, and investigated variables influencing the relative magnitudes of the indices and metrics in structural equation measurement models. A broad-based meta-analysis of reported construct reliability indices and selected model fit metrics revealed modest relationships among reliability indices, minimal relationships among model fit metrics, and a virtual absence of relationships between reliability indices and model fit metrics. Differences in magnitudes of selected reliability indices and model fit metrics were found to primarily be a function of the (total) number of items employed in a measurement model. The implications of the findings suggest that the current practice of indiscriminately computing and reporting of reliability indices and model fit metrics based only on arbitrary heuristics should be abolished and replaced by theoretically justified indices and metrics.


Methodology ◽  
2014 ◽  
Vol 10 (4) ◽  
pp. 138-152 ◽  
Author(s):  
Hsien-Yuan Hsu ◽  
Susan Troncoso Skidmore ◽  
Yan Li ◽  
Bruce Thompson

The purpose of the present paper was to evaluate the effect of constraining near-zero parameter cross-loadings to zero in the measurement component of a structural equation model. A Monte Carlo 3 × 5 × 2 simulation design was conducted (i.e., sample sizes of 200, 600, and 1,000; parameter cross-loadings of 0.07, 0.10, 0.13, 0.16, and 0.19 misspecified to be zero; and parameter path coefficients in the structural model of either 0.50 or 0.70). Results indicated that factor pattern coefficients and factor covariances were overestimated in measurement models when near-zero parameter cross-loadings constrained to zero were higher than 0.13 in the population. Moreover, the path coefficients between factors were misestimated when the near-zero parameter cross-loadings constrained to zero were noteworthy. Our results add to the literature detailing the importance of testing individual model specification decisions, and not simply evaluating omnibus model fit statistics.


2012 ◽  
Vol 69 (11) ◽  
pp. 1881-1893 ◽  
Author(s):  
Verena M. Trenkel ◽  
Mark V. Bravington ◽  
Pascal Lorance

Catch curves are widely used to estimate total mortality for exploited marine populations. The usual population dynamics model assumes constant recruitment across years and constant total mortality. We extend this to include annual recruitment and annual total mortality. Recruitment is treated as an uncorrelated random effect, while total mortality is modelled by a random walk. Data requirements are minimal as only proportions-at-age and total catches are needed. We obtain the effective sample size for aggregated proportion-at-age data based on fitting Dirichlet-multinomial distributions to the raw sampling data. Parameter estimation is carried out by approximate likelihood. We use simulations to study parameter estimability and estimation bias of four model versions, including models treating mortality as fixed effects and misspecified models. All model versions were, in general, estimable, though for certain parameter values or replicate runs they were not. Relative estimation bias of final year total mortalities and depletion rates were lower for the proposed random effects model compared with the fixed effects version for total mortality. The model is demonstrated for the case of blue ling (Molva dypterygia) to the west of the British Isles for the period 1988 to 2011.


2021 ◽  
Vol 99 (Supplement_2) ◽  
pp. 22-22
Author(s):  
Charles A Zumbaugh ◽  
Susannah A Gonia ◽  
Kathryn M Payne ◽  
Thomas B Wilson

Abstract The objectives of this experiment were to determine changes in the nutritive value and ergot alkaloid concentrations of endophyte-infected tall fescue hay and haylage during a 180-d storage period. Forage from a single field of Kentucky-31 tall fescue was cut for hay in late June and allowed to dry in the field. The dry matter (DM) of the windrow of cut forage was measured every 2 h after clipping. Forage was sampled from the windrow in 6 location blocks once forage DM reached target levels for haylage and hay treatments. Haylage and hay samples were taken when the DM of the windrow reached 50% and 80%, respectively. Seven subsamples of each treatment within block were chopped to 1.91 cm in length with a lettuce chopper and vacuum sealed in oxygen-excluding bags. Sample bags were stored indoors and opened at 30 d intervals over the 180-d storage period. Samples were analyzed for pH, nutritive value, and individual ergot alkaloid concentrations using high-performance liquid chromatography. Within each storage day, treatment within block was considered the experimental unit. Data were analyzed in SAS using the MIXED procedure with fixed effects of treatment, day, and the treatment by day interaction. Location block was considered a random effect. As expected, pH was decreased for haylage compared to hay at all time points (P < 0.01) and DM was greater (P < 0.01) for hay compared to haylage. Neutral detergent fiber values were greater (P < 0.01) for hay compared to haylage and declined during storage (P < 0.01). Total ergot alkaloid concentrations did not differ by treatment (P = 0.61), but ergovaline concentrations declined (P < 0.01) during storage. Collectively, these results indicate minimal differences in nutritive value and ergot alkaloid concentrations between hay and haylage during storage, and that ergovaline concentrations decline during storage.


Author(s):  
Rachel J Sorensen ◽  
James S Drouillard ◽  
Teresa L Douthit ◽  
Qinghong Ran ◽  
Douglas G Marthaler ◽  
...  

Abstract The effect of hay type on the microbiome of the equine gastrointestinal tract is relatively unexplored. Our objective was to characterize the cecal and fecal microbiome of mature horses consuming alfalfa or Smooth Bromegrass (brome) hay. Six cecally cannulated horses were used in a split plot design run as a crossover in 2 periods. Whole plot treatment was ad libitum access to brome or alfalfa hay fed over two 21-d acclimation periods with subplots of sampling location (cecum and rectum) and sampling hour. Each acclimation period was followed by a 24-h collection period where cecal and fecal samples were collected every 3 h for analysis of pH and volatile fatty acids (VFA). Fecal and cecal samples were pooled and sent to a commercial lab (MR DNA, Shallowater, TX) for amplification of the V4 region of the 16S rRNA gene and sequenced using Illumina HiSeq. Main effects of hay on VFA, pH, and taxonomic abundances were analyzed using the MIXED procedure of SAS 9.4 with fixed effects of hay, hour, location, period, all possible interactions and random effect of horse. Alpha and β diversity were analyzed using the R Dame package. Horses fed alfalfa had greater fecal than cecal pH (P ≤ 0.05) whereas horses fed brome had greater cecal than fecal pH (P ≤ 0.05). Regardless of hay type, total volatile fatty acid (VFA) concentrations were greater (P ≤ 0.05) in the cecum than in feces, and alfalfa resulted in greater (P ≤ 0.05) VFA concentrations than brome in both sampling locations. Alpha diversity was greater (P ≤ 0.05) in fecal compared to cecal samples. Microbial community structure within each sampling location and hay type differed from one another (P ≤ 0.05). Bacteroidetes were greater (P ≤ 0.05) in the cecum compared to the rectum, regardless of hay type. Firmicutes and Firmicutes:Bacteroidetes were greater (P ≤ 0.05) in the feces compared to cecal samples of alfalfa-fed horses. In all, fermentation parameters and bacterial abundances were impacted by hay type and sampling location in the hindgut.


Assessment ◽  
2021 ◽  
pp. 107319112199876
Author(s):  
Shalom H. Schwartz ◽  
Jan Cieciuch

Researchers around the world are applying the recently revised Portrait Value Questionnaire (PVQ-RR) to measure the 19 values in Schwartz’s refined values theory. We assessed the internal reliability, circular structure, measurement model, and measurement invariance of values measured by this questionnaire across 49 cultural groups ( N = 53,472) and 32 language versions. The PVQ-RR reliably measured 15 of the 19 values in the vast majority of groups and two others in most groups. The fit of the theory-based measurement models supported the differentiation of almost all values in every cultural group. Almost all values were measured invariantly across groups at the configural and metric level. A multidimensional scaling analysis revealed that the PVQ-RR perfectly reproduced the theorized order of the 19 values around the circle across groups. The current study established the PVQ-RR as a sound instrument to measure and to compare the hierarchies and correlates of values across cultures.


2021 ◽  
pp. 227797522096830
Author(s):  
Palaniappan Gurusamy

The study aims to examine the relationship between corporate ownership structure and capital structure of BSE listed manufacturing firms in India. The study has included the sample of 357 companies which covers 16 major sectors during the period of 2006–2015. Considering the dynamic panel nature of the data relating to the capital structure and the ownership structure variables. The analysis undertakes a novel approach of examining the determinants both single equation and reduced equation models. In order to determine the most appropriate model, based on the F test, the Breusch Pagan LM test and finally the Hausman Test is conducted. The Hausman test result has been estimated by the fixed effect model is better than the other two models such as pooled OLS and random effect estimation. Based on the fixed effects results, size, risk and profitability have a highly significant relationship with leverage. Meanwhile, the growth opportunities and tangibility represent insignificant values. The study found that the explanatory variables of the promoters’ ownership and the institutional ownership have a negative impact on leverage, while the corporate ownership has a positive influence on the capital structure decision. The individual or public ownership has a negative and significantly related to the capital structure, whereas the effect of the foreign ownership inversely related to the firm’s leverage.


Sign in / Sign up

Export Citation Format

Share Document