scholarly journals Comparison of Precision and Accuracy of Five Methods to Analyse Total Score Data

2020 ◽  
Vol 23 (1) ◽  
Author(s):  
Gustaf J. Wellhagen ◽  
Mats O. Karlsson ◽  
Maria C. Kjellsson

AbstractTotal score (TS) data is generated from composite scales consisting of several questions/items, such as the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS). The analysis method that most fully uses the information gathered is item response theory (IRT) models, but these are complex and require item-level data which may not be available. Therefore, the TS is commonly analysed with standard continuous variable (CV) models, which do not respect the bounded nature of data. Bounded integer (BI) models do respect the data nature but are not as extensively researched. Mixed models for repeated measures (MMRM) are an alternative that requires few assumptions and handles dropout without bias. If an IRT model exists, the expected mean and standard deviation of TS can be computed through IRT-informed functions—which allows CV and BI models to estimate parameters on the IRT scale. The fit, performance on external data and parameter precision (when applicable) of CV, BI and MMRM to analyse simulated TS data from the MDS-UPDRS motor subscale are investigated in this work. All models provided accurate predictions and residuals without trends, but the fit of CV and BI models was improved by IRT-informed functions. The IRT-informed BI model had more precise parameter estimates than the IRT-informed CV model. The IRT-informed models also had the best performance on external data, while the MMRM model was worst. In conclusion, (1) IRT-informed functions improve TS analyses and (2) IRT-informed BI models had more precise IRT parameter estimates than IRT-informed CV models.

2021 ◽  
Vol 23 (3) ◽  
Author(s):  
Gustaf J. Wellhagen ◽  
Sebastian Ueckert ◽  
Maria C. Kjellsson ◽  
Mats O. Karlsson

AbstractComposite scale data is widely used in many therapeutic areas and consists of several categorical questions/items that are usually summarized into a total score (TS). Such data is discrete and bounded by nature. The gold standard to analyse composite scale data is item response theory (IRT) models. However, IRT models require item-level data while sometimes only TS is available. This work investigates models for TS. When an IRT model exists, it can be used to derive the information as well as expected mean and variability of TS at any point, which can inform TS-analyses. We propose a new method: IRT-informed functions of expected values and standard deviation in TS-analyses. The most common models for TS-analyses are continuous variable (CV) models, while bounded integer (BI) models offer an alternative that respects scale boundaries and the nature of TS data. We investigate the method in CV and BI models on both simulated and real data. Both CV and BI models were improved in fit by IRT-informed disease progression, which allows modellers to precisely and accurately find the corresponding latent variable parameters, and IRT-informed SD, which allows deviations from homoscedasticity. The methodology provides a formal way to link IRT models and TS models, and to compare the relative information of different model types. Also, joint analyses of item-level data and TS data are made possible. Thus, IRT-informed functions can facilitate total score analysis and allow a quantitative analysis of relative merits of different analysis methods.


2018 ◽  
Vol 43 (3) ◽  
pp. 195-210 ◽  
Author(s):  
Chen-Wei Liu ◽  
Wen-Chung Wang

It is commonly known that respondents exhibit different response styles when responding to Likert-type items. For example, some respondents tend to select the extreme categories (e.g., strongly disagree and strongly agree), whereas some tend to select the middle categories (e.g., disagree, neutral, and agree). Furthermore, some respondents tend to disagree with every item (e.g., strongly disagree and disagree), whereas others tend to agree with every item (e.g., agree and strongly agree). In such cases, fitting standard unfolding item response theory (IRT) models that assume no response style will yield a poor fit and biased parameter estimates. Although there have been attempts to develop dominance IRT models to accommodate the various response styles, such models are usually restricted to a specific response style and cannot be used for unfolding data. In this study, a general unfolding IRT model is proposed that can be combined with a softmax function to accommodate various response styles via scoring functions. The parameters of the new model can be estimated using Bayesian Markov chain Monte Carlo algorithms. An empirical data set is used for demonstration purposes, followed by simulation studies to assess the parameter recovery of the new model, as well as the consequences of ignoring the impact of response styles on parameter estimators by fitting standard unfolding IRT models. The results suggest the new model to exhibit good parameter recovery and seriously biased estimates when the response styles are ignored.


2020 ◽  
Vol 44 (7-8) ◽  
pp. 563-565
Author(s):  
Hwanggyu Lim ◽  
Craig S. Wells

The R package irtplay provides practical tools for unidimensional item response theory (IRT) models that conveniently enable users to conduct many analyses related to IRT. For example, the irtplay includes functions for calibrating online items, scoring test-takers’ proficiencies, evaluating IRT model-data fit, and importing item and/or proficiency parameter estimates from the output of popular IRT software. In addition, the irtplay package supports mixed-item formats consisting of dichotomous and polytomous items.


2019 ◽  
Vol 79 (5) ◽  
pp. 827-854 ◽  
Author(s):  
Paul-Christian Bürkner ◽  
Niklas Schulte ◽  
Heinz Holling

Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating forced-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to five traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor for Bayesian estimation methods. As a result, persons’ trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizable number of traits.


Assessment ◽  
2020 ◽  
pp. 107319112095288
Author(s):  
Dexin Shi ◽  
E. Rebekah Siceloff ◽  
Rebeca E. Castellanos ◽  
Rachel M. Bridges ◽  
Zhehan Jiang ◽  
...  

This study illustrated the effect of varying the number of response alternatives in clinical assessment using a within-participant, repeated-measures approach. Participants reported the presence of current attention-deficit/hyperactivity disorder symptoms using both a binary and a polytomous (4-point) rating scale across two counterbalanced administrations of the Current Symptoms Scale (CSS). Psychometric properties of the CSS were examined using (a) self-reported binary, (b) self-reported 4-point ratings obtained from each administration of the CSS, and (c) artificially dichotomized responses derived from observed 4-point ratings. Under the same ordinal factor analysis model, results indicated that the number of response alternatives affected item parameter estimates, standard errors, goodness of fit indices, individuals’ test scores, and reliability of the test scores. With fewer response alternatives, the precision of the measurement decreased, and the power of using the goodness-of-fit indices to detect model misfit decreased. These findings add to recent research advocating for the inclusion of a large number of response alternatives in the development of clinical assessments and further suggest that researchers should be cautious about reducing the number of response categories in data analysis.


2018 ◽  
Author(s):  
Paul - Christian Bürkner ◽  
Niklas Schulte ◽  
Heinz Holling

Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating force-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to 5 traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor Bayesian estimation methods. As a result, persons' trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizeable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizeable number of traits.


2021 ◽  
pp. 089198872199680
Author(s):  
Jared T. Hinkle ◽  
Kelly A. Mills ◽  
Kate Perepezko ◽  
Gregory M. Pontone

Objective: To test the hypothesis that striatal dopamine function influences motivational alterations in Parkinson disease (PD), we compared vesicular monoamine transporter 2 (VMAT2) and dopamine transporter (DaT) imaging data in PD patients with impulse control disorders (ICDs), apathy, or neither. Methods: We extracted striatal binding ratios (SBR) from VMAT2 PET imaging (18F-AV133) and DaTscans from the Parkinson’s Progression Markers Initiative (PPMI) multicenter observational study. Apathy and ICDs were assessed using the Movement Disorders Society-revised Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) and the Questionnaire for Impulsive-Compulsive Disorders in Parkinson’s Disease (QUIP), respectively. We used analysis of variance (ANOVA) and log-linear mixed-effects (LME) regression to model SBRs with neurobehavioral metrics. Results: Among 23 participants (mean age 62.7 years, mean disease duration 1.8 years) with VMAT2 imaging data, 5 had apathy, 5 had an ICD, and 13 had neither. ANOVA indicated strong groupwise differences in VMAT2 binding in right anterior putamen [F(2,20) = 16.2, p < 0.0001), right posterior putamen [F(2,20) = 16.9, p < 0.0001), and right caudate [F(2,20) = 6.8, p = 0.006)]. Post-hoc tests and repeated-measures analysis with LME regression also supported right striatal VMAT2 elevation in the ICD group and reduction in the apathy group relative to the group with neither ICD nor apathy. DaT did not exhibit similar correlations, but normalizing VMAT2 with DaT SBR strengthened bidirectional correlations with ICD (high VMAT2/DaT) and apathy (low VMAT2/DaT) in all striatal regions bilaterally. Conclusions: Our findings constitute preliminary evidence that striatal presynaptic dopaminergic function helps describe the neurobiological basis of motivational dysregulation in PD, from high in ICDs to low in apathy.


2017 ◽  
Vol 33 (3) ◽  
pp. 181-189 ◽  
Author(s):  
Christoph J. Kemper ◽  
Michael Hock

Abstract. Anxiety Sensitivity (AS) denotes the tendency to fear anxiety-related sensations. Trait AS is an established risk factor for anxiety pathology. The Anxiety Sensitivity Index-3 (ASI-3) is a widely used measure of AS and its three most robust dimensions with well-established construct validity. At present, the dimensional conceptualization of AS, and thus, the construct validity of the ASI-3 is challenged. A latent class structure with two distinct and qualitatively different forms, an adaptive form (normative AS) and a maladaptive form (AS taxon, predisposing for anxiety pathology) was postulated. Item Response Theory (IRT) models were applied to item-level data of the ASI-3 in an attempt to replicate previous findings in a large nonclinical sample (N = 2,603) and to examine possible interpretations for the latent discontinuity observed. Two latent classes with a pattern of distinct responses to ASI-3 items were found. However, classes were indicative of participant’s differential use of the response scale (midpoint and extreme response style) rather than differing in AS content (adaptive and maladaptive AS forms). A dimensional structure of AS and the construct validity of the ASI-3 was supported.


Sign in / Sign up

Export Citation Format

Share Document