Some Psychometric Problems of the Matching Familiar Figures Test

Administered were Kagan's Matching Familiar Figures Test and a group intelligence test to 151 male and 130 female Japanese 2nd graders, and detailed analyses of responses were made. While matching response time was high in internal consistency, errors were much less consistent. Variant positions were differentially selected by 4 groups, and the position where correct variants were placed partially accounted for “the error variance” of errors. But it seemed that errors of the matching test could not be made reliable enough simply by refining and lengthening the present version of Kagan's test. While slow-accurate, fast-accurate, and slow-inaccurate children adjusted their response time to item difficulty, fast-inaccurate children failed to do so. Almost all of these results were replicated in third and fifth graders. By the ordinary scoring method of intelligence test, the 4 matching groups differed from each other only in girls. But by adjusting the scores for errors, intelligence test performance came to correlate with the matching figures even in boys.

Download Full-text

Comparable Intelligence Test Performance by Extreme Social Classes

PsycEXTRA Dataset ◽

10.1037/e465412008-078 ◽

1970 ◽

Author(s):

Clinton I. Chase

Keyword(s):

Test Performance ◽

Intelligence Test ◽

Social Classes

Download Full-text

Tracking and Targeting: Investigating Item Performance on the English Section of a University Entrance Examination over a 4-Year Period

JALT Journal - JALT Journal 24.1 ◽

10.37546/jaltjj30.1-6 ◽

2008 ◽

Vol 30 (1) ◽

pp. 105

Author(s):

Christopher Weaver ◽

Yoko Sato

Keyword(s):

Test Performance ◽

Systematic Approach ◽

Rating Scales ◽

Item Difficulty ◽

Language Skills ◽

Measurement Theory ◽

Entrance Examination ◽

University Admissions ◽

Entrance Examinations ◽

National University

This empirical study introduces population targeting and cut-off point targeting as a systematic approach to evaluating the performance of items in the English section of university entrance examinations. Using Rasch measurement theory, we found that the item difficulty and the types of items in a series of national university entrance examinations varied considerably over a 4-year period. However, there was progress towards improved test performance in terms of an increased number of items assessing different language skills and content areas as well as an increased number targeting test takers’ knowledge of English. This study also found that productive items rather than receptive items better targeted test takers’ overall knowledge of English. Moreover, productive items were more consistently located around the probable cut point for university admissions. The paper concludes with a detailed account of a number of probable factors that could influence item performance, such as the use of rating scales. 本論文では、ある国立大学における大学入試の英語の問題の変化を実証的に検証したものである。テスト項目の結果を検証するための体系的なアプローチとして、「母集団を対象としたアプローチ」および「足きり点を対象としたアプローチ」という方法を導入した。ラッシュ・モデリングを用いて分析した結果、過去４年間の間に、項目の困難度および項目の型について、様々な技能を測定していること、内容も多様であること、英語の知識を検証している項目が増えたこと、などの点で大きく変化していることがわかった。さらに、産出能力の方が受容能力を測定する項目よりも入学者決定の際の足きり点の周辺に収束する傾向が見られた。項目ごとの成績に影響を及ぼす可能性のある多様な要因について詳細な検討を行った。

Download Full-text

The 32-Item Multilingual Naming Test: Cultural and Linguistic Biases in Monolingual Chinese-Speaking Older Adults

Journal of the International Neuropsychological Society ◽

10.1017/s1355617721000746 ◽

2021 ◽

pp. 1-9

Author(s):

Clara Li ◽

Xiaoyi Zeng ◽

Judith Neugroschl ◽

Amy Aloysi ◽

Carolyn W. Zhu ◽

...

Keyword(s):

Older Adults ◽

New York ◽

Test Performance ◽

Item Difficulty ◽

Chinese Older Adults ◽

Chinese Speakers ◽

School Of Medicine ◽

Consensus Conferences ◽

Linguistic Groups ◽

Naming Test

ABSTRACT Objectives: This study describes the performance of the Multilingual Naming Test (MINT) by Chinese American older adults who are monolingual Chinese speakers. An attempt was also made to identify items that could introduce bias and warrant attention in future investigation. Methods: The MINT was administered to 67 monolingual Chinese older adults as part of the standard dementia evaluation at the Alzheimer’s Disease Research Center (ADRC) at the Icahn School of Medicine at Mount Sinai (ISMMS), New York, USA. A diagnosis of normal cognition (n = 38), mild cognitive impairment (n = 12), and dementia (n = 17) was assigned to all participants at clinical consensus conferences using criterion sheets developed at the ADRC at ISMMS. Results: MINT scores were negatively correlated with age and positively correlated with education, showing sensitivity to demographic factors. One item, butterfly, showed no variations in responses across diagnostic groups. Inclusion of responses from different regions of China changed the answers from “incorrect” to “correct” on 20 items. The last five items, porthole, anvil, mortar, pestle, and axle, yielded a high nonresponse rate, with more than 70% of participants responding with “I don’t know.” Four items, funnel, witch, seesaw, and wig, were not ordered with respect to item difficulty in the Chinese language. Two items, gauge and witch, were identified as culturally biased for the monolingual group. Conclusions: Our study highlights the cultural and linguistic differences that might influence the test performance. Future studies are needed to revise the MINT using more universally recognized items of similar word frequency across different cultural and linguistic groups.

Download Full-text

Language Dialect, Reinforcement, and the Intelligence-Test Performance of Negro Children

Child Development ◽

10.2307/1127058 ◽

1971 ◽

Vol 42 (1) ◽

pp. 5 ◽

Cited By ~ 27

Author(s):

Lorene C. Quay

Keyword(s):

Test Performance ◽

Intelligence Test

Download Full-text

Effects of Certain Subject Variables on Stanford-Binet Item Performance of Five-Year-Old Children

Psychological Reports ◽

10.2466/pr0.1970.26.3.975 ◽

1970 ◽

Vol 26 (3) ◽

pp. 975-984

Author(s):

Mary Juhan Larsen ◽

Jerry C. Allen

Keyword(s):

Socioeconomic Status ◽

Test Performance ◽

Intelligence Test ◽

Single Variable ◽

Community Size ◽

Intelligence Level ◽

Performance Differences ◽

Subject Variables ◽

Test Standardization

Item performance on the Stanford-Binet by a sample ( n = 289) of Georgia children (CA = 5) and by equivalent-aged children used in the test standardization was compared in terms of 5 subject variables: race, sex, socioeconomic status, intelligence level, and community size. The Georgia Sample's performance exceeded ( p < .01) the norm group's performance on 62% of the items. The five subject variables were associated with these differences: among levels of the variables, variations occurred in the presence and direction of item performance differences; generally, more than one subject variable influenced item performance. These data affirm that certain variables confound intelligence test performance and that norms based on a single variable, like race, do not eliminate test biases.

Download Full-text

The Clock Drawing Test: Performance differences between the free-drawn and incomplete-copy versions in patients with MCI and dementia

Dementia & Neuropsychologia ◽

10.1590/s1980-5764-2016dn1003009 ◽

2016 ◽

Vol 10 (3) ◽

pp. 227-231 ◽

Cited By ~ 2

Author(s):

Bárbara Costa Beber ◽

Renata Kochhann ◽

Bruna Matias ◽

Márcia Lorena Fagundes Chaves

Keyword(s):

Cognitive Impairment ◽

Test Performance ◽

Diagnostic Value ◽

Control Group ◽

Cognitive Screening ◽

Scoring Method ◽

Clock Drawing Test ◽

Clock Drawing ◽

Healthy Elderly ◽

Drawing Test

ABSTRACT Background: The Clock Drawing Test (CDT) is a brief cognitive screening tool for dementia. Several different presentation formats and scoring methods for the CDT are available in the literature. Objective: In this study we aimed to compare performance on the free-drawn and "incomplete-copy" versions of the CDT using the same short scoring method in Mild Cognitive Impairment (MCI) and dementia patients, and healthy elderly participants. Methods: 90 participants (controlled for age, sex and education) subdivided into control group (n=20), MCI group (n=30) and dementia group (n=40) (Alzheimer's disease - AD=20; Vascular Dementia - VD=20) were recruited for this study. The participants performed the two CDT versions at different times and a blinded neuropsychologist scored the CDTs using the same scoring system. Results: The scores on the free-drawn version were significantly lower than the incomplete-copy version for all groups. The dementia group had significantly lower scores on the incomplete-copy version of the CDT than the control group. MCI patients did not differ significantly from the dementia or control groups. Performance on the free-drawn copy differed significantly among all groups. Conclusion: The free-drawn CDT version is more cognitively demanding and sensitive for detecting mild/early cognitive impairment. Further evaluation of the diagnostic value (accuracy) of the free-drawn CDT in Brazilian MCI patients is needed.

Download Full-text

A Psychometric Examination of the Anagram Persistence Task: More Than Two Unsolvable Anagrams May Not Be Better

Assessment ◽

10.1177/1073191118789260 ◽

2018 ◽

Vol 27 (6) ◽

pp. 1198-1212 ◽

Cited By ~ 3

Author(s):

Gilles E. Gignac ◽

Ka Ki Wong

Keyword(s):

Processing Speed ◽

Test Performance ◽

Convergent Validity ◽

Response Times ◽

Nonlinear Effects ◽

Intelligence Test ◽

Factorial Validity ◽

Test Taking ◽

Two Factors ◽

Administration Time

The purpose of this investigation was to examine a single-anagram, a double-anagram, and multi-anagram versions of the Anagram Persistence Task (APT) for factorial validity, reliability, and convergent validity. Additionally, a battery of intelligence tests was administered to examine convergent validity. Based on an unrestricted factor analysis, two factors were uncovered from the 14 anagram (seven very difficult and seven very easy) response times: test-taking persistence and verbal processing speed. The internal consistency reliabilities for the single-anagram, double-anagram, and multi-anagram (seven difficult anagrams) measures were .42, .85, and .86, respectively. Furthermore, all three versions of the APT correlated positively with intelligence test performance ( r ≈ .22). However, the double-anagram and multi-anagram versions also evidenced negative, nonlinear effects with intelligence test performance ( r ≈ −.15), which suggested the possibility of testee adaptation. Taking psychometrics and administration time into consideration, simultaneously, the double-anagram version of the APT may be regarded as preferred.

Download Full-text

Intelligence test performance as affected by anxiety and test administration procedures

Journal of Clinical Psychology ◽

10.1002/1097-4679(198210)38:4<825::aid-jclp2270380423>3.0.co;2-2 ◽

1982 ◽

Vol 38 (4) ◽

pp. 825-829 ◽

Cited By ~ 5

Author(s):

Shitala P. Mishra

Keyword(s):

Test Performance ◽

Intelligence Test ◽

Test Administration

Download Full-text

How is Intelligence Test Performance Associated with Creative Achievement? A Meta-Analysis

10.31234/osf.io/fm7hr ◽

2021 ◽

Author(s):

Maciej Karwowski ◽

Marta Czerwonka ◽

Ewa Wiśniewska ◽

Boris Forthmann

Keyword(s):

Test Performance ◽

Meta Analysis ◽

Correlation Coefficients ◽

Intelligence Test ◽

Future Studies ◽

Creative Achievement ◽

The Arts ◽

Everyday Creativity ◽

Intelligence Test Scores ◽

Intelligence Scores

This paper presents a meta-analysis of the links between intelligence test scores and creative achievement. A three-level meta-analysis of 117 correlation coefficients from 30 studies has found a correlation of r = .16 (95% CI: .12, .19), closely mirroring previous meta-analytic findings. The estimated effects were stronger for overall creative achievement and achievement in scientific domains than for correlations between intelligence scores and creative achievement in the arts and everyday creativity. No signs of publication bias were found. We discuss theoretical implications and provide recommendations for future studies.

Download Full-text

The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution

Language Testing in Asia ◽

10.1186/s40468-021-00134-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ali Khodi

Keyword(s):

Writing Assessment ◽

Generalizability Theory ◽

Error Variance ◽

Total Variance ◽

Scoring Method ◽

Writing Assessments ◽

Assessment Scores ◽

Peer Rating ◽

Reliable Assessment ◽

Writing Scores

AbstractThe present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating, three peers,-rating and two instructors-rating. The main purpose of the sudy was to determine the relative and absolute contributions of different facets such as student, rater, task, method of scoring, and background of education to the validity of writing assessment scores. The results indicated three major sources of variance: (a) the student by task by method of scoring (nested in background of education) interaction (STM:B) with 31.8% contribution to the total variance, (b) the student by rater by task by method of scoring (nested in background of education) interaction (SRTM:B) with 26.5% of contribution to the total variance, and (c) the student by rater by method of scoring (nested in background of education) interaction (SRM:B) with 17.6% of the contribution. With regard to the G-coefficients in G-study (relative G-coefficient ≥ 0.86), it was also found that the result of the assessment was highly valid and reliable. The sources of error variance were detected as the student by rater (nested in background of education) (SR:B) and rater by background of education with 99.2% and 0.8% contribution to the error variance, respectively. Additionally, ten separate G-studies were conducted to investigate the contribution of different facets across rater, task, and methods of scoring as differentiation facet. These studies suggested that peer rating, analytical scoring method, and integrated writing tasks were the most reliable and generalizable designs of the writing assessments. Finally, five decision-making studies (D-studies) in optimization level were conducted and it was indicated that at least four raters (with G-coefficient = 0.80) are necessary for a valid and reliable assessment. Based on these results, to achieve the greatest gain in generalizability, teachers should have their students take two writing assessments and their performance should be rated on at least two scoring methods by at least four raters.

Download Full-text