Tracking and Targeting: Investigating Item Performance on the English Section of a University Entrance Examination over a 4-Year Period

2008 ◽  
Vol 30 (1) ◽  
pp. 105
Author(s):  
Christopher Weaver ◽  
Yoko Sato

This empirical study introduces population targeting and cut-off point targeting as a systematic approach to evaluating the performance of items in the English section of university entrance examinations. Using Rasch measurement theory, we found that the item difficulty and the types of items in a series of national university entrance examinations varied considerably over a 4-year period. However, there was progress towards improved test performance in terms of an increased number of items assessing different language skills and content areas as well as an increased number targeting test takers’ knowledge of English. This study also found that productive items rather than receptive items better targeted test takers’ overall knowledge of English. Moreover, productive items were more consistently located around the probable cut point for university admissions. The paper concludes with a detailed account of a number of probable factors that could influence item performance, such as the use of rating scales. 本論文では、ある国立大学における大学入試の英語の問題の変化を実証的に検証したものである。テスト項目の結果を検証するための体系的なアプローチとして、「母集団を対象としたアプローチ」および「足きり点を対象としたアプローチ」という方法を導入した。ラッシュ・モデリングを用いて分析した結果、過去4年間の間に、項目の困難度および項目の型について、様々な技能を測定していること、内容も多様であること、英語の知識を検証している項目が増えたこと、などの点で大きく変化していることがわかった。さらに、産出能力の方が受容能力を測定する項目よりも入学者決定の際の足きり点の周辺に収束する傾向が見られた。項目ごとの成績に影響を及ぼす可能性のある多様な要因について詳細な検討を行った。

Author(s):  
Clara Li ◽  
Xiaoyi Zeng ◽  
Judith Neugroschl ◽  
Amy Aloysi ◽  
Carolyn W. Zhu ◽  
...  

ABSTRACT Objectives: This study describes the performance of the Multilingual Naming Test (MINT) by Chinese American older adults who are monolingual Chinese speakers. An attempt was also made to identify items that could introduce bias and warrant attention in future investigation. Methods: The MINT was administered to 67 monolingual Chinese older adults as part of the standard dementia evaluation at the Alzheimer’s Disease Research Center (ADRC) at the Icahn School of Medicine at Mount Sinai (ISMMS), New York, USA. A diagnosis of normal cognition (n = 38), mild cognitive impairment (n = 12), and dementia (n = 17) was assigned to all participants at clinical consensus conferences using criterion sheets developed at the ADRC at ISMMS. Results: MINT scores were negatively correlated with age and positively correlated with education, showing sensitivity to demographic factors. One item, butterfly, showed no variations in responses across diagnostic groups. Inclusion of responses from different regions of China changed the answers from “incorrect” to “correct” on 20 items. The last five items, porthole, anvil, mortar, pestle, and axle, yielded a high nonresponse rate, with more than 70% of participants responding with “I don’t know.” Four items, funnel, witch, seesaw, and wig, were not ordered with respect to item difficulty in the Chinese language. Two items, gauge and witch, were identified as culturally biased for the monolingual group. Conclusions: Our study highlights the cultural and linguistic differences that might influence the test performance. Future studies are needed to revise the MINT using more universally recognized items of similar word frequency across different cultural and linguistic groups.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 212
Author(s):  
Jeanette Melin ◽  
Stefan Cano ◽  
Leslie Pendrill

Commonly used rating scales and tests have been found lacking reliability and validity, for example in neurodegenerative diseases studies, owing to not making recourse to the inherent ordinality of human responses, nor acknowledging the separability of person ability and item difficulty parameters according to the well-known Rasch model. Here, we adopt an information theory approach, particularly extending deployment of the classic Brillouin entropy expression when explaining the difficulty of recalling non-verbal sequences in memory tests (i.e., Corsi Block Test and Digit Span Test): a more ordered task, of less entropy, will generally be easier to perform. Construct specification equations (CSEs) as a part of a methodological development, with entropy-based variables dominating, are found experimentally to explain (r=R2 = 0.98) and predict the construct of task difficulty for short-term memory tests using data from the NeuroMET (n = 88) and Gothenburg MCI (n = 257) studies. We propose entropy-based equivalence criteria, whereby different tasks (in the form of items) from different tests can be combined, enabling new memory tests to be formed by choosing a bespoke selection of items, leading to more efficient testing, improved reliability (reduced uncertainties) and validity. This provides opportunities for more practical and accurate measurement in clinical practice, research and trials.


1987 ◽  
Vol 60 (3_part_2) ◽  
pp. 1023-1040
Author(s):  
Mary E. Farmer ◽  
Lon R. White ◽  
Steven J. Kittner ◽  
Edith Kaplan ◽  
Elizabeth Moes ◽  
...  

In 1976–1978, a battery of eight neuropsychologic tests was administered to 2,123 participants in the Framingham Study aged 55 to 89 yr. The battery was designed to sample multiple areas of cognitive function including language skills, memory, learning, reproduction of designs, attention, and abstract thinking. Performance is described for several groups in this population: a large community-dwelling sample, those with hearing impairments, and those with documented strokes. Performance is described by age, sex, and education strata for the community sample. This normative information should be useful for interpreting individual test performance on neuropsychological tests.


2021 ◽  
Vol 3 (1) ◽  
pp. 1-9
Author(s):  
Adamu Chidubem Deborah ◽  
Babatimehin Temitope ◽  
Adeoye Oluseyi Peter

Antiquity ◽  
1997 ◽  
Vol 71 (271) ◽  
pp. 37-39
Author(s):  
Su Bingqi

This article, first published in Zhongguo jianshe [China Reconstructs 1987(9)], became particularly well known when it was selected for the ‘Language and literature paper’ in the 1988 national university entrance examinations. This English version is translated and footnoted byu Wang Tao.


1994 ◽  
Vol 10 (2) ◽  
pp. 157-187 ◽  
Author(s):  
Carol A. Chapelle

Second language (L2) researchers (Singleton and Little, 1991) have sug gested that C-tests, developed as norm-referenced measures for proficiency and placement testing (Klein-Braley, 1985), can be used in L2 vocabulary research. This article illustrates how researchers can bring to bear essentials of measurement theory on L2 research by weighing validity justifications pertaining to use of the C-test method for vocabulary assessment in L2 research. Validity is defined using the predominant framework from current measurement theory (Messick, 1989) and its relevance for L2 research is explained. The cornerstone of the definition is construct validity, which requires a definition of the construct to be measured - interlanguage vocabulary (i.e., vocabulary ability). A theoretical definition of vocabulary ability is presented and used to consider justifications for and against interpreting C-test performance as indicative of vocabulary ability. On the basis of evidence concerning construct validity and utility as well as the consequences of interpretations, the potentials and limitations of the C-test method for L2 vocabulary research are identified.


2017 ◽  
Vol 41 (S1) ◽  
pp. S107-S107
Author(s):  
J. De Jonghe ◽  
T. Schoemaker ◽  
D. Lam ◽  
P. Andre de la Porte

Background and aimsOver 50% of adult disability claimants fail some form of SVT. While some over report psychological, affective symptoms, others may report incredible cognitive symptoms. We examined effects of different types of response bias on free recall and self-reported depression.Participants and methods This is a single site cross-sectional study using a convenience sample (n = 224) of disability claimants in the Netherlands. The Green Word Memory Test (GWMT) was administered to all subjects. The Amsterdam Short Term Memory Test (AKTG), the Structured Inventory of Malingered. Symptomatology (SIMS), and the beck depression inventory (BDI-II) were administered in subsamples. Participant classification according to GWMT and SIMS outcomes resulted in four groups, G+/S+, G+/S−, G−/S+ and G−/S−.ResultsAverage age of the participants was 46.3 years (SD 9.9), 41.5% were female, and 43% were higher educated. GWMT was positive in 48.2% of all subjects, and 27.6% scored positive on both GWMT and SIMS. Analysis of variance of GWMT Free recall and Beck depression scores showed significant group differences [F(3, 123) = 33.21, P = .000] and [F(3, 106) = 25.17, P = .000] respectively.ConclusionsNon credible test performance was prevalent in this Dutch study of disability claimants. Insufficient effort and over reporting of psychological symptoms are associated with different score profiles on regular tests and self-rating scales.Disclosure of interestThe author receives funding for his work as a neuropsychologist in an expertise setting.


Sign in / Sign up

Export Citation Format

Share Document