Some Psychometric Problems of the Matching Familiar Figures Test

1976 ◽  
Vol 43 (3) ◽  
pp. 731-742 ◽  
Author(s):  
Hideo Kojima

Administered were Kagan's Matching Familiar Figures Test and a group intelligence test to 151 male and 130 female Japanese 2nd graders, and detailed analyses of responses were made. While matching response time was high in internal consistency, errors were much less consistent. Variant positions were differentially selected by 4 groups, and the position where correct variants were placed partially accounted for “the error variance” of errors. But it seemed that errors of the matching test could not be made reliable enough simply by refining and lengthening the present version of Kagan's test. While slow-accurate, fast-accurate, and slow-inaccurate children adjusted their response time to item difficulty, fast-inaccurate children failed to do so. Almost all of these results were replicated in third and fifth graders. By the ordinary scoring method of intelligence test, the 4 matching groups differed from each other only in girls. But by adjusting the scores for errors, intelligence test performance came to correlate with the matching figures even in boys.

2008 ◽  
Vol 30 (1) ◽  
pp. 105
Author(s):  
Christopher Weaver ◽  
Yoko Sato

This empirical study introduces population targeting and cut-off point targeting as a systematic approach to evaluating the performance of items in the English section of university entrance examinations. Using Rasch measurement theory, we found that the item difficulty and the types of items in a series of national university entrance examinations varied considerably over a 4-year period. However, there was progress towards improved test performance in terms of an increased number of items assessing different language skills and content areas as well as an increased number targeting test takers’ knowledge of English. This study also found that productive items rather than receptive items better targeted test takers’ overall knowledge of English. Moreover, productive items were more consistently located around the probable cut point for university admissions. The paper concludes with a detailed account of a number of probable factors that could influence item performance, such as the use of rating scales. 本論文では、ある国立大学における大学入試の英語の問題の変化を実証的に検証したものである。テスト項目の結果を検証するための体系的なアプローチとして、「母集団を対象としたアプローチ」および「足きり点を対象としたアプローチ」という方法を導入した。ラッシュ・モデリングを用いて分析した結果、過去4年間の間に、項目の困難度および項目の型について、様々な技能を測定していること、内容も多様であること、英語の知識を検証している項目が増えたこと、などの点で大きく変化していることがわかった。さらに、産出能力の方が受容能力を測定する項目よりも入学者決定の際の足きり点の周辺に収束する傾向が見られた。項目ごとの成績に影響を及ぼす可能性のある多様な要因について詳細な検討を行った。


Author(s):  
Clara Li ◽  
Xiaoyi Zeng ◽  
Judith Neugroschl ◽  
Amy Aloysi ◽  
Carolyn W. Zhu ◽  
...  

ABSTRACT Objectives: This study describes the performance of the Multilingual Naming Test (MINT) by Chinese American older adults who are monolingual Chinese speakers. An attempt was also made to identify items that could introduce bias and warrant attention in future investigation. Methods: The MINT was administered to 67 monolingual Chinese older adults as part of the standard dementia evaluation at the Alzheimer’s Disease Research Center (ADRC) at the Icahn School of Medicine at Mount Sinai (ISMMS), New York, USA. A diagnosis of normal cognition (n = 38), mild cognitive impairment (n = 12), and dementia (n = 17) was assigned to all participants at clinical consensus conferences using criterion sheets developed at the ADRC at ISMMS. Results: MINT scores were negatively correlated with age and positively correlated with education, showing sensitivity to demographic factors. One item, butterfly, showed no variations in responses across diagnostic groups. Inclusion of responses from different regions of China changed the answers from “incorrect” to “correct” on 20 items. The last five items, porthole, anvil, mortar, pestle, and axle, yielded a high nonresponse rate, with more than 70% of participants responding with “I don’t know.” Four items, funnel, witch, seesaw, and wig, were not ordered with respect to item difficulty in the Chinese language. Two items, gauge and witch, were identified as culturally biased for the monolingual group. Conclusions: Our study highlights the cultural and linguistic differences that might influence the test performance. Future studies are needed to revise the MINT using more universally recognized items of similar word frequency across different cultural and linguistic groups.


1970 ◽  
Vol 26 (3) ◽  
pp. 975-984
Author(s):  
Mary Juhan Larsen ◽  
Jerry C. Allen

Item performance on the Stanford-Binet by a sample ( n = 289) of Georgia children (CA = 5) and by equivalent-aged children used in the test standardization was compared in terms of 5 subject variables: race, sex, socioeconomic status, intelligence level, and community size. The Georgia Sample's performance exceeded ( p < .01) the norm group's performance on 62% of the items. The five subject variables were associated with these differences: among levels of the variables, variations occurred in the presence and direction of item performance differences; generally, more than one subject variable influenced item performance. These data affirm that certain variables confound intelligence test performance and that norms based on a single variable, like race, do not eliminate test biases.


2016 ◽  
Vol 10 (3) ◽  
pp. 227-231 ◽  
Author(s):  
Bárbara Costa Beber ◽  
Renata Kochhann ◽  
Bruna Matias ◽  
Márcia Lorena Fagundes Chaves

ABSTRACT Background: The Clock Drawing Test (CDT) is a brief cognitive screening tool for dementia. Several different presentation formats and scoring methods for the CDT are available in the literature. Objective: In this study we aimed to compare performance on the free-drawn and "incomplete-copy" versions of the CDT using the same short scoring method in Mild Cognitive Impairment (MCI) and dementia patients, and healthy elderly participants. Methods: 90 participants (controlled for age, sex and education) subdivided into control group (n=20), MCI group (n=30) and dementia group (n=40) (Alzheimer's disease - AD=20; Vascular Dementia - VD=20) were recruited for this study. The participants performed the two CDT versions at different times and a blinded neuropsychologist scored the CDTs using the same scoring system. Results: The scores on the free-drawn version were significantly lower than the incomplete-copy version for all groups. The dementia group had significantly lower scores on the incomplete-copy version of the CDT than the control group. MCI patients did not differ significantly from the dementia or control groups. Performance on the free-drawn copy differed significantly among all groups. Conclusion: The free-drawn CDT version is more cognitively demanding and sensitive for detecting mild/early cognitive impairment. Further evaluation of the diagnostic value (accuracy) of the free-drawn CDT in Brazilian MCI patients is needed.


Assessment ◽  
2018 ◽  
Vol 27 (6) ◽  
pp. 1198-1212 ◽  
Author(s):  
Gilles E. Gignac ◽  
Ka Ki Wong

The purpose of this investigation was to examine a single-anagram, a double-anagram, and multi-anagram versions of the Anagram Persistence Task (APT) for factorial validity, reliability, and convergent validity. Additionally, a battery of intelligence tests was administered to examine convergent validity. Based on an unrestricted factor analysis, two factors were uncovered from the 14 anagram (seven very difficult and seven very easy) response times: test-taking persistence and verbal processing speed. The internal consistency reliabilities for the single-anagram, double-anagram, and multi-anagram (seven difficult anagrams) measures were .42, .85, and .86, respectively. Furthermore, all three versions of the APT correlated positively with intelligence test performance ( r ≈ .22). However, the double-anagram and multi-anagram versions also evidenced negative, nonlinear effects with intelligence test performance ( r ≈ −.15), which suggested the possibility of testee adaptation. Taking psychometrics and administration time into consideration, simultaneously, the double-anagram version of the APT may be regarded as preferred.


2021 ◽  
Author(s):  
Maciej Karwowski ◽  
Marta Czerwonka ◽  
Ewa Wiśniewska ◽  
Boris Forthmann

This paper presents a meta-analysis of the links between intelligence test scores and creative achievement. A three-level meta-analysis of 117 correlation coefficients from 30 studies has found a correlation of r = .16 (95% CI: .12, .19), closely mirroring previous meta-analytic findings. The estimated effects were stronger for overall creative achievement and achievement in scientific domains than for correlations between intelligence scores and creative achievement in the arts and everyday creativity. No signs of publication bias were found. We discuss theoretical implications and provide recommendations for future studies.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ali Khodi

AbstractThe present study attempted to to investigate  factors  which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating,  three peers,-rating and two instructors-rating. The main purpose of the sudy was to determine the relative and absolute contributions of different facets such as student, rater, task, method of scoring, and background of education  to the validity of writing assessment scores. The results indicated three major sources of variance: (a) the student by task by method of scoring (nested in background of education) interaction (STM:B) with 31.8% contribution to the total variance, (b) the student by rater by task by method of scoring (nested in background of education) interaction (SRTM:B) with 26.5% of contribution to the total variance, and (c) the student by rater by method of scoring (nested in background of education) interaction (SRM:B) with 17.6% of the contribution. With regard to the G-coefficients in G-study (relative G-coefficient ≥ 0.86), it was also found that the result of the assessment was highly valid and reliable. The sources of error variance were detected as the student by rater (nested in background of education) (SR:B) and rater by background of education with 99.2% and 0.8% contribution to the error variance, respectively. Additionally, ten separate G-studies were conducted to investigate the contribution of different facets across rater, task, and methods of scoring as differentiation facet. These studies suggested that peer rating, analytical scoring method, and integrated writing tasks were the most reliable and generalizable designs of the writing assessments. Finally, five decision-making studies (D-studies) in optimization level were conducted and it was indicated that at least four raters (with G-coefficient = 0.80) are necessary for a valid and reliable assessment. Based on these results, to achieve the greatest gain in generalizability, teachers should have their students take two writing assessments and their performance should be rated on at least two scoring methods by at least four raters.


Sign in / Sign up

Export Citation Format

Share Document