scholarly journals Reliability and Validity of Scores from the Foot and Ankle Ability Measure

2018 ◽  
Vol 3 (3) ◽  
pp. 2473011418S0008
Author(s):  
Lauren Matheny ◽  
Thomas Clanton

Category: Ankle Introduction/Purpose: A commonly used measure of ankle function is the Foot and Ankle Ability Measure (FAAM). To support interpretation of the FAAM, evidence of reliability and validity must be established. Some studies have assessed FAAM scores; however, these studies had small sample sizes, sample characteristics that may limit generalizability, and did not report reliability estimates. These studies were also unable to account for person ability and item difficulty, a unique feature Rasch modeling offers, which is key when attempting to generalize to other populations. The purpose of this study was to determine whether there is evidence of reliability and validity for the FAAM ADL and Sport scales, utilizing the Rasch model, in patients who have undergone surgical intervention for the treatment of an ankle injury. Methods: Evidence of reliability and validity were determined utilizing the Rasch measurement model, a special case of item response theory, which has been used to develop new patient reported outcome measures and improve existing measures. This is a widely used technique that may be used as an alternative to classical test theory due to advantages including generalizability across samples, accounting for response options not equally spaced in terms of ability, and identifying poorly functioning items. The scale of interest is measured in terms of item difficulty and generates estimates of locations of individual items (item difficulty) and ability level along a common interval-level scale (log-odds). To identify misfit items, outfit mean-square (MNSQ) and infit MNSQ statistics were assessed. Infit and outfit MNSQ range from 0 to positive infinity (ideal value of 1.0 means observed variance = expected variance; acceptable value range 0.5 -1.7). Person reliability was also reported (analogous to Cronbach’s a). Results: There were 456 patients included in the study(192 females, 264 males)(average age=47.6 years(18-79). Rasch analysis showed good evidence of reliability for FAAM ADL and FAAM Sport scores (Figure 1). Person reliability was 0.87 for FAAM ADL and 0.89 for FAAM Sport. Outfit MNSQ values for FAAM ADL items 11 (Coming Up On Toes) and 10 (Squatting) were high (2.17, 1.96). Item 19 “Light/Moderate Work” was low(0.48), indicating item redundancy. For FAAM Sport, all outfit values (range 0.67 -1.64) were within the acceptable range. For internal scale validity, infit MNSQ values for FAAM ADL items 11 and 10 were high(2.30, 2.05). All other infit values (range 0.61 -1.48) were within the acceptable range. For FAAM Sport, all infit values (range 0.74 -1.65) were within the acceptable range. Conclusion: This study provides good evidence of reliability for FAAM ADL and Sport scores in a wide range of patients who underwent ankle surgery, which may demonstrate wide clinical applicability. Both scales demonstrated good internal scale validity; however, 3 FAAM ADL items may indicate the need for further scale development for use in a diverse ankle population.

2019 ◽  
Vol 41 (2) ◽  
pp. 229-236 ◽  
Author(s):  
Lauren M. Matheny ◽  
Thomas O. Clanton

Background: The purpose of this study was to determine the reliability and validity of scores from the Foot and Ankle Ability Measure (FAAM) Activities of Daily Living (ADL) and Sports scales in patients who have a variety of ankle injuries. Methods: All patients who underwent surgical treatment for an ankle injury and completed the FAAM ADL and Sport scales were included in this study ( n = 456; 192 females, 264 males). The average age was 47.6 years (range, 18-79 years). The average time to follow-up was 3.8 years (range, 2.0-7.7 years). All data were collected prospectively and reviewed retrospectively. A reliability and validity analysis, utilizing the Rasch measurement model, a special case of item response theory (IRT), was conducted. Results: Reliability was very good. For FAAM ADL, person reliability was 0.87 and item reliability was 0.99. For FAAM Sport, person reliability was 0.89 and item reliability was 1.0. Infit mean square (MNSQ) values, which assess internal scale validity, were examined. For FAAM ADL, items 11 (coming up on your toes) and 10 (squatting) were high (2.27 and 2.08, respectively). All other infit values were within the acceptable range of 0.5 to 1.7. For FAAM Sport, all infit values were within the acceptable range. Outfit MNSQ values, which assess the FAAM ADL and Sport rating scale function, were examined. Three items from FAAM ADL were beyond the acceptable range. Items 10 and 11 from FAAM ADL had high outfit MNSQ values (2.15 and 1.98, respectively). Item 19 (light to moderate work) item had a marginally low outfit MNSQ of 0.48. For FAAM Sport, all outfit values were within the acceptable range. Conclusion: There was very good evidence of the reliability and validity of FAAM ADL and FAAM Sport scores. Two FAAM ADL items may indicate the need for further scale development for use in a diverse surgical ankle population. Level of Evidence: Level III, comparative series.


2016 ◽  
Vol 12 (28) ◽  
pp. 263 ◽  
Author(s):  
Awopeju, O. A. ◽  
Afolabi, E. R. I.

The study compared Classical Test Theory (CTT) and Item Response Theory (IRT)-estimated item difficulty and item discrimination indices in relation to the ability of examinees in Senior School Certificate Examination (SSCE) in Mathematics with a view to providing empirical basis for informed decisions on the appropriateness of statistical and psychometric tests. The study adopted ex-post-facto design. A sample of 6,000 students was selected from the population of 35,262 students who sat for the NECO SSCE Mathematics Paper 1 in 2008 in Osun State, Nigeria. An instrument consisting of 60-multiple-choice items, May/June 2008 NECO SSCE Mathematics Paper 1 was used. Three sampling plans: random, gender and ability sampling plans were employed to study the behaviours of the examinees scores under the CTT and IRT measurement frameworks. BILOG-MG 3 was used to estimate the indices of item parameters and SPSS 20 was used to compare CTT- and IRT-based item parameters. The results showed that CTT-based item difficulty estimates and oneparameter IRT item difficulty estimates were comparable (the correlations were generally in the -0.702 to -0.988 range in large sample and -0.622 to - 0.989 range in small sample). Results also indicated that CTT-based and two-parameter IRT-based item discrimination estimates were comparable (the correlations were in the 0.430 to 0.880 ranges in large sample and 0.531 to 0.950 range in small sample). The study concluded that CTT and IRT were comparable in estimating item characteristics of statistical and psychometric tests and thus could be used as complementary procedures in the development of national examinations


Author(s):  
Kenneth Chukwuemeka Obionu ◽  
Michael Rindom Krogsgaard ◽  
Christian Fugl Hansen ◽  
Jonathan David Comins

2021 ◽  
Vol 11 (13) ◽  
pp. 6048
Author(s):  
Jaroslav Melesko ◽  
Simona Ramanauskaite

Feedback is a crucial component of effective, personalized learning, and is usually provided through formative assessment. Introducing formative assessment into a classroom can be challenging because of test creation complexity and the need to provide time for assessment. The newly proposed formative assessment algorithm uses multivariate Elo rating and multi-armed bandit approaches to solve these challenges. In the case study involving 106 students of the Cloud Computing course, the algorithm shows double learning path recommendation precision compared to classical test theory based assessment methods. The algorithm usage approaches item response theory benchmark precision with greatly reduced quiz length without the need for item difficulty calibration.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 212
Author(s):  
Jeanette Melin ◽  
Stefan Cano ◽  
Leslie Pendrill

Commonly used rating scales and tests have been found lacking reliability and validity, for example in neurodegenerative diseases studies, owing to not making recourse to the inherent ordinality of human responses, nor acknowledging the separability of person ability and item difficulty parameters according to the well-known Rasch model. Here, we adopt an information theory approach, particularly extending deployment of the classic Brillouin entropy expression when explaining the difficulty of recalling non-verbal sequences in memory tests (i.e., Corsi Block Test and Digit Span Test): a more ordered task, of less entropy, will generally be easier to perform. Construct specification equations (CSEs) as a part of a methodological development, with entropy-based variables dominating, are found experimentally to explain (r=R2 = 0.98) and predict the construct of task difficulty for short-term memory tests using data from the NeuroMET (n = 88) and Gothenburg MCI (n = 257) studies. We propose entropy-based equivalence criteria, whereby different tasks (in the form of items) from different tests can be combined, enabling new memory tests to be formed by choosing a bespoke selection of items, leading to more efficient testing, improved reliability (reduced uncertainties) and validity. This provides opportunities for more practical and accurate measurement in clinical practice, research and trials.


2021 ◽  
Vol 104 (3) ◽  
pp. 003685042110283
Author(s):  
Meltem Yurtcu ◽  
Hülya Kelecioglu ◽  
Edward L Boone

Bayesian Nonparametric (BNP) modelling can be used to obtain more detailed information in test equating studies and to increase the accuracy of equating by accounting for covariates. In this study, two covariates are included in the equating under the Bayes nonparametric model, one is continuous, and the other is discrete. Scores equated with this model were obtained for a single group design for a small group in the study. The equated scores obtained with the model were compared with the mean and linear equating methods in the Classical Test Theory. Considering the equated scores obtained from three different methods, it was found that the equated scores obtained with the BNP model produced a distribution closer to the target test. Even the classical methods will give a good result with the smallest error when using a small sample, making equating studies valuable. The inclusion of the covariates in the model in the classical test equating process is based on some assumptions and cannot be achieved especially using small groups. The BNP model will be more beneficial than using frequentist methods, regardless of this limitation. Information about booklets and variables can be obtained from the distributors and equated scores that obtained with the BNP model. In this case, it makes it possible to compare sub-categories. This can be expressed as indicating the presence of differential item functioning (DIF). Therefore, the BNP model can be used actively in test equating studies, and it provides an opportunity to examine the characteristics of the individual participants at the same time. Thus, it allows test equating even in a small sample and offers the opportunity to reach a value closer to the scores in the target test.


Sign in / Sign up

Export Citation Format

Share Document