Norming

Author(s):  
Dave Bartram ◽  
Fons J. R. van de Vijver

Chapter 30 focuses on issues relating to norm-referenced measures, in particular the use of norms in international assessments. This chapter highlights some of the complex issues involved in norming scores. While the initial sections of the chapter review some general issues of norm construction and use, this is not a chapter on the mechanics of how to produce norms. Rather, it focuses on issues of when and how to use norms, what aggregations of samples to base them on, and how norm-referenced scores should be interpreted. In particular, it considers issues relating to the development and use of international norms. Test norms are often essential for stakeholders to understand the meaning of test scores by providing information about the standing of the test taker relative to other members of the population. Finally, the chapter notes that culturally related variance may reflect either measurement bias or effects of cultural style.

1965 ◽  
Vol 21 (1) ◽  
pp. 199-206 ◽  
Author(s):  
Francis W. King ◽  
Thomas D. Bird

The Trail Making Test was administered to 201 male, college undergraduates who were waiting to be seen in an out-patient clinic. Correlations between the various Trail Making Test scores and the CEEB Scholastic Aptitude Test scores are reported. Since the performance of these students is quite different from the performance of Ss reported in other studies, normative tables are presented for male undergraduates.


2018 ◽  
Vol 36 (2) ◽  
pp. 289-309 ◽  
Author(s):  
Hanwook Yoo ◽  
Venessa F. Manna ◽  
Lora F. Monfils ◽  
Hyeon-Joo Oh

This study illustrates the use of score equity assessment (SEA) for evaluating the fairness of reported test scores from assessments intended for test takers from diverse cultural, linguistic, and educational backgrounds, using a workplace English proficiency test. Subgroups were defined by test-taker background characteristics that research has shown to be associated with performance on language tests. The characteristics studied included gender, age, educational background, language exposure, and previous experience with the assessment. Overall, the empirical results indicated that the statistical and psychometric methods used in producing test scores were not strongly influenced by the subgroups of test takers from which the scores were derived. This result provides evidence in support of the comparability and meaning of test scores across the various test-taker groups studied. This example may encourage language testing programs to incorporate SEA analyses to provide evidence to inform the validity and fairness of reported scores for all groups of test takers.


1995 ◽  
Vol 62 (1) ◽  
pp. 7-15 ◽  
Author(s):  
Mary-Ann L. Fulks ◽  
Susan R. Harris

This article presents findings from a retrospective study of 54 children who were prenatally exposed to drugs and who received the Miller Assessment for Preschoolers as part of a developmental follow-up clinic protocol. Data were analyzed using nonparametric descriptive statistics to examine trends in the test scores compared to the test norms and to determine if a distinctive clinical profile was present. Although a distinctive clinical profile did not emerge, the overall test results indicated a skewness toward the lower end of the spectrum with poorer performance identified on test items measuring tactile, proprioceptive and vestibular processing, and language. Performance of items that assessed aspects of non-verbal cognition tended to be within the normal range. The difficulties of conducting studies within this group of children are discussed.


2020 ◽  
Vol 37 (4) ◽  
pp. 503-522
Author(s):  
Yeonsuk Cho ◽  
Ian A. Blood

In this study, we examined how much change in TOEFL® Primary™ listening and reading scores can be expected in relation to the time interval between test administrations. The test records of 5213 young learners of English (aged 8–13 years) in Japan and Turkey who repeated the tests were analyzed to examine test scores as a function of time interval. The effect of time on test scores was analyzed with a multilevel modeling approach, allowing both initial scores and rate of change among individual test takers to vary. In addition, we examined the effects of test-taker age and test-level difficulty on test scores. Separate analyses were conducted by country for ease of interpretation, as Japan and Turkey differ with respect to the number of hours of instruction that students receive and the English-learning goals in their respective curricula. Results showed a positive rate of change, indicating that test scores increase gradually over time. However, the rate of change differed between the two countries. Furthermore, repeaters’ test scores increased with their age and with the length of time between test administrations. Findings provide empirical evidence for schools to refer to when determining the timing of re-administration of the TOEFL Primary tests to their students.


2018 ◽  
Vol 36 (1) ◽  
pp. 3-25 ◽  
Author(s):  
Khaled Barkaoui

This study aimed to examine the sources of variability in the second-language (L2) writing scores of test-takers who repeated an English language proficiency test, the Pearson Test of English (PTE) Academic, multiple times. Examining repeaters’ test scores can provide important information concerning factors contributing to changes in test scores across test occasions. Data consisted of the scores and background data (e.g., gender, age) and other covariates (e.g., context, interval between tests, number of tests attempted) for a sample of 1,000 test-takers who each took PTE Academic three times or more. Multilevel modeling was used to estimate the contribution of various factors to variability in repeaters’ PTE Academic writing scores across test-takers and test occasions. The findings indicated that changes in PTE Academic writing scores followed a quadratic trajectory (i.e., initial score increases followed by a decline) and that, as expected, test-taker initial overall English language proficiency (as measured on other sections of the test) was the strongest predictor of differences in PTE Academic writing scores at test occasion one as well as variance (across test-takers) in the rate of change in writing scores over time. Measures of retesting effects were not significantly associated with changes in writing scores, while test-taker factors (e.g., age, gender, and purpose for taking the test) were significantly associated with writing scores at test occasion one, but not with the rate of change in writing scores over time. The study highlights the value of examining repeater’ L2 test scores and concludes with a call for more research on the sensitivity of L2 proficiency tests to changes in L2 proficiency over time and in relation to L2 instruction.


2020 ◽  
Vol 8 (1) ◽  
Author(s):  
Dalit Contini ◽  
Federica Cugnata

AbstractThe development of international surveys on children’s learning like PISA, PIRLS and TIMSS—delivering comparable achievement measures across educational systems—has revealed large cross-country variability in average performance and in the degree of inequality across social groups. A key question is whether and how institutional differences affect the level and distribution of educational outcomes. In this contribution, we discuss the difference-in-differences strategies employed in the existing literature to evaluate the effect of early tracking on learning inequalities exploiting international assessments administered at different age/grades. In their seminal paper, Hanushek and Woessmann (Econ J 116:C63–C76, 2006) analyze with two-step estimation the effect of early tracking on overall inequalities, measured by test scores’ variability indexes. Later work of other scholars in the economics and sociology of education focuses instead on inequalities among children of different family background, using individual-level models on pooled data from different countries and assessments. In this contribution, we show that individual pooled difference-in-differences models are quite restrictive and that in essence they estimate the effect of tracking by double differentiating the estimated cross-sectional family background regression coefficients between tracking regimes and learning assessments. Starting from a simple learning growth model, we show that if test scores at different surveys are not measured on the same scale, as occurs for international learning assessments, pooled individual models may deliver severely biased results. Instead, the scaling problem does not affect the two-step approach. For this reason, we suggest using two-step estimation also to analyze family-background achievement inequalities. Against this background, using PIRLS-2006 and PISA-2012 we conduct two-step analyses, finding new evidence that early tracking fosters both overall inequalities and family background differentials in reading literacy.


2012 ◽  
Vol 105 (9) ◽  
pp. 666-670 ◽  
Author(s):  
Zalman Usiskin

A strong curriculum is not the sole reason for Singaporean students' success on international assessments.


1977 ◽  
Vol 8 (1) ◽  
pp. 5-14 ◽  
Author(s):  
David L. Ratusnik ◽  
Roy A. Koenigsknecht

Six speech and language clinicians, three black and three white, administered the Goodenough Drawing Test (1926) to 144 preschoolers. The four groups, lower socioeconomic black and white and middle socioeconomic black and white, were divided equally by sex. The biracial clinical setting was shown to influence test scores in black preschool-age children.


2010 ◽  
Vol 20 (1) ◽  
pp. 27-31
Author(s):  
Lyn Robertson

Abstract Learning to listen and speak are well-established preludes for reading, writing, and succeeding in mainstream educational settings. Intangibles beyond the ubiquitous test scores that typically serve as markers for progress in children with hearing loss are embedded in descriptions of the educational and social development of four young women. All were diagnosed with severe-to-profound or profound hearing loss as toddlers, and all were fitted with hearing aids and given listening and spoken language therapy. Compiling stories across the life span provides insights into what we can be doing in the lives of young children with hearing loss.


Sign in / Sign up

Export Citation Format

Share Document