scholarly journals Ice Is Hot and Water Is Dry

Author(s):  
Natalie Förster ◽  
Jörg-Tobias Kuhn

Abstract. To monitor students’ progress and adapt instruction to students’ needs, teachers increasingly use repeated assessments of equivalent tests. The present study investigates whether equivalent reading tests can be successfully developed via rule-based item design. Based on theoretical considerations, we identified 3-item features for reading comprehension at the word, sentence, and text levels, respectively, which should influence the difficulty and time intensity of reading processes. Using optimal design algorithms, a design matrix was calculated, and four equivalent test forms of the German reading test series for second graders (quop-L2) were developed. A total of N = 7,751 students completed the tests. We estimated item difficulty and time intensity parameters as well as person ability and speed parameters using bivariate item response theory (IRT) models, and we investigated the influence of item features on item parameters. Results indicate that all item properties significantly affected either item difficulty or response time. Moreover, as indicated by the IRT-based test information functions and analyses of variance, the four different test forms showed similar levels of difficulty and time-intensity at the word, sentence, and text levels (all η2 < .002). Results were successfully cross-validated using a sample of N = 5,654 students.

Author(s):  
Rob Kim Marjerison ◽  
Pengfei Liu ◽  
Liam P. Duffy ◽  
Rongjuan Chen

This study explores which types of IELTS Academic Reading strategies are used, and the impact of these strategies on test outcomes. The study was a quantitative research, using descriptive-correlational design based on data collected from students at Sino-US University in China. Descriptive and inferential statistics were used to analyze the data. The method used in this study was a partial replication the work of a previous researcher's exploration of the reading processes learners engage in when taking IELTS Reading tests. Participants first finished an IELTS reading test, and then completed a written retrospective protocol. The analysis reveals that there is a moderately positive relationship between the choice of text preview strategy (from 1 to 5) and the outcome. A pattern was identified that using expeditious reading strategies to initially locate information, and more careful reading strategies to identify answers to the question tasks was common among high-scoring participants.


2019 ◽  
Author(s):  
Daniela Ramona Crișan ◽  
Jorge Tendeiro ◽  
Rob Meijer

In this chapter, the practical consequences of violations of unidimensionality on selection decisions in the framework of unidimensional item response theory (IRT) models are investigated based on simulated data. The factors manipulated include the severity of violations, the proportion of misfitting items, and test length. The outcomes that were considered are the precision and accuracy of the estimated model parameters, the correlations of estimated ability (θ-hat) and number-correct (NC) scores with the true ability (θ), the ranks of the examinees and the overlap between sets of examinees selected based on either θ, θ-hat, or NC scores, and the bias in criterion-related validity estimates. Results show that the θ-hat values were unbiased by violations of unidimensionality, but their precision decreased as multidimensionality and the proportion of misfitting items increased; the estimated item parameters were robust to violations of un dimensionality. The correlations between θ, θ-hat, and NC scores, the agreement between the three selection criteria, and the accuracy of criterion-related validity estimates are all negatively affected, to some extent, by increasing levels of multidimensionality and the proportion of misfitting items. However, removing the misfitting items only improved the results in the case of severe multidimensionality and large proportion of misfitting items, and deteriorated them otherwise.


2021 ◽  
Author(s):  
William Goette

Objective: Develop and test an explanatory item response theory model (IRT) that examines properties of both the test (e.g., word order, learning over trials) and items (e.g., frequency of words in English) on the CERAD List Learning Test immediate recall trials.Methods: Item-level response data from 1050 participants (Mage=73.74 [SD=6.89], Medu=13.77 [SD=2.41]) in the Harmonized Cognitive Assessment Protocol were used to construct various IRT models. A Bayesian generalized (non-)linear multilevel modeling framework was utilized to specify the Rasch and two-parameter logistic (2PL) IRT models. Leave-one-out cross-validation information criteria and pseudo-Bayesian model averaging were used to compare models. Posterior predictive checks helped validate model performance in predicting data observations. Fixed effects for learning over trials, serial position of words, and 9 word properties of the words (obtained through the English Lexicon Project) were modeled for their effects on item properties.Results: A random person, random item 2PL model with an item-specific inter-trial learning effect (i.e., local dependency effect) provided the best fit of any of the models examined. Of the 9 word traits examined, only 4 has highly probable effects on item difficulty such that words became harder to learn with increasing frequency in English, average age of acquisition, and concreteness and lower levels of body-object integration.Conclusions: Results support that memory performance depends on more than repetition of words across trials. The finding that word traits affect difficulty and predict learning raise interesting potentials for test translation, equating word lists, and extending test interpretation to more nuanced semantic deficits.


2016 ◽  
Vol 41 (2) ◽  
pp. 97-114 ◽  
Author(s):  
Yongsang Lee ◽  
Mark Wilson

The Model With Internal Restrictions on Item Difficulty (MIRID; Butter, 1994) has been useful for investigating cognitive behavior in terms of the processes that lead to that behavior. The main objective of the MIRID model is to enable one to test how component processes influence the complex cognitive behavior in terms of the item parameters. The original MIRID model is, indeed, a fairly restricted model for a number of reasons. One of these restrictions is that the model treats items as fixed and does not fit measurement contexts where the concept of the random items is needed. In this article, random item approaches to the MIRID model are proposed, and both simulation and empirical studies to test and illustrate the random item MIRID models are conducted. The simulation and empirical studies show that the random item MIRID models provide more accurate estimates when substantial random errors exist, and thus these models may be more beneficial.


2018 ◽  
Vol 17 (1) ◽  
pp. 1
Author(s):  
Farida Agus Setiawati ◽  
Rita Eka Izzaty ◽  
Veny Hidayat

This study aims to analyze the characteristics of the Scholastic Aptitude Test (SAT), consisting of both verbal and numerical subtests. We used a descriptive quantitative approach by describing the characteristics of SAT based on the degree of item difficulty, item discrimination index, pseudoguessing index, test information function and standard error measurement. The data are responses of the SAT instrument, collected from 1,047 subjects in Yogyakarta using the documentation technique. Data were then analyzed by Item Response Theory (IRT) approach with the help of the BILOG program on all logistic parameter models, preceded by identifying item suitability with the model. Analysis concludes that: verbal subtest tends to compliment the 2-PL and 3-PL model, meanwhile, numerical subtest only fit the 2-PL model. Majority items of SAT have a good characteristic on index of item difficulty, item discrimination, and pseudoguessing, and based of test information function, SAT is accurate to be used in the 1-PL, 2-PL, and 3-PL IRT models for all level of ability.


2018 ◽  
Author(s):  
Moritz Körber

The increasing number of interactions with automated systems has sparked the interest of researchers in trust in automation because it predicts not only whether but also how an operator interacts with an automation. In this work, a theoretical model of trust in automation is established and the development and evaluation of a corresponding questionnaire (Trust in Automation, TiA) are described. Building on the model of organizational trust by Mayer, Davis, and Schoorman (1995) and the theoretical account by Lee and See (2004), a model for trust in automation containing six underlying dimensions was established. Following a deductive approach, an initial set of 57 items was generated. In a first online study, these items were analyzed and based on the criteria item difficulty, standard deviation, item-total correlation, internal consistency, overlap with other items in content, and response quote, 40 items were eliminated and two scales were merged, leaving six scales (Reliability/Competence, Understandability/Predictability, Propensity to Trust, Intention of Developers, Familiarity, and Trust in Automation) containing a total of 19 items. The internal structure of the resulting questionnaire was analyzed in a subsequent second online study by means of an exploratory factor analysis. The results show sufficient preliminary evidence for the proposed factor structure and demonstrate that further pursuit of the model is reasonable but certain revisions may be necessary. The calculated omega coefficients indicated good to excellent reliability for all scales. The results also provide evidence for the questionnaire’s criterion validity: Consistent with the expectations, an unreliable automated driving system received lower trust ratings as a reliably functioning system. In a subsequent empirical driving simulator study, trust ratings could predict reliance on an automated driving system and monitoring in form of gaze behavior. Possible steps for revisions are discussed and recommendations for the application of the questionnaire are given.


2018 ◽  
Author(s):  
Maja Olsbjerg ◽  
Karl Bang Christensen ◽  

IRT models are often applied when observed items are used to measure a unidimensional latent variable. Originally used in educational research, IRT models are now widely used when focus is on physical functioning or psychological well-being. Modern applications often need more general models, typically models for multidimensional latent variables or longitudinal models for repeated measurements. This paper describes a collection of SAS macros that can be used for fitting data to, simulating from, and visualizing longitudinal IRT models. The macros encompass dichotomous as well as polytomous item response formats and are sufficiently flexible to accommodate changes in item parameters across time points and local dependence between responses at different time points.


2017 ◽  
Vol 20 ◽  
Author(s):  
Miguel A. García-Pérez

AbstractThreshold parameters have distinct referents across models for ordered responses. In difference models, thresholds are trait levels at which responding beyond category k is as likely as responding at or below it; in divide-by-total models, thresholds are trait levels at which responding in category k is as likely as responding in category k – 1. Thus, thresholds in divide-by-total models (but not in difference models) are the crossings of the option response functions for consecutive categories. Thresholds in difference models are always ordered but they may inconsequentially yield ordered or disordered crossings. In contrast, assimilation of thresholds and crossings in divide-by-total models questions category order when crossings are disordered. We analyze these aspects of difference and divide-by-total models, their relation to the order of response categories, and the consequences of collapsing categories to instate ordered crossings under divide-by-total models. We also show that item parameters in models for ordered responses can never contradict the pre-assumed order of categories and that the empirical order can only be established using a polytomous model that does not assume ordered categories, although this often gives rise to spurious outcomes. Practical implications for scale development are discussed.


2017 ◽  
Vol 41 (5) ◽  
pp. 323-337 ◽  
Author(s):  
Bozhidar M. Bashkov ◽  
Christine E. DeMars

The purpose of this study was to examine the performance of the Metropolis–Hastings Robbins–Monro (MH-RM) algorithm in the estimation of multilevel multidimensional item response theory (ML-MIRT) models. The accuracy and efficiency of MH-RM in recovering item parameters, latent variances and covariances, as well as ability estimates within and between clusters (e.g., schools) were investigated in a simulation study, varying the number of dimensions, the intraclass correlation coefficient, the number of clusters, and cluster size, for a total of 24 conditions. Overall, MH-RM performed well in recovering the item, person, and group-level parameters of the model. Ratios of the empirical to analytical standard errors indicated that the analytical standard errors reported in flexMIRT were somewhat overestimated for the cluster-level ability estimates, a little too large for the person-level ability estimates, and essentially accurate for the other parameters. Limitations of the study, implications for educational measurement practice, and directions for future research are offered.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Masoud Geramipour

AbstractRasch testlet and bifactor models are two measurement models that could deal with local item dependency (LID) in assessing the dimensionality of reading comprehension testlets. This study aimed to apply the measurement models to real item response data of the Iranian EFL reading comprehension tests and compare the validity of the bifactor models and corresponding item parameters with unidimensional and multidimensional Rasch models. The data collected from the EFL reading comprehension section of the Iranian national university entrance examinations from 2016 to 2018. Various advanced packages of the R system were employed to fit the Rasch unidimensional, multidimensional, and testlet models and the exploratory and confirmatory bifactor models. Then, item parameters estimated and testlet effects identified; moreover, goodness of fit indices and the item parameter correlations for the different models were calculated. Results showed that the testlet effects were all small but non-negligible for all of the EFL reading testlets. Moreover, bifactor models were superior in terms of goodness of fit, whereas exploratory bifactor model better explained the factor structure of the EFL reading comprehension tests. However, item difficulty parameters in the Rasch models were more consistent than the bifactor models. This study had substantial implications for methods of dealing with LID and dimensionality in assessing reading comprehension with reference to the EFL testing.


Sign in / Sign up

Export Citation Format

Share Document