Test Administration and Test Items

Scoring Tests With Contaminated Response Vectors

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998619882902 ◽

2019 ◽

Vol 45 (2) ◽

pp. 209-226

Author(s):

Arnond Sakworawich ◽

Howard Wainer

Keyword(s):

Multiple Choice ◽

Robust Estimator ◽

Test Administration ◽

Data Set ◽

Test Items ◽

Test Scoring ◽

Multiple Choice Items

Test scoring models vary in their generality, some even adjust for examinees answering multiple-choice items correctly by accident (guessing), but no models, that we are aware of, automatically adjust an examinee’s score when there is internal evidence of cheating. In this study, we use a combination of jackknife technology with an adaptive robust estimator to reduce the bias in examinee scores due to contamination through events such as having access to some of the test items in advance of the test administration. We illustrate our methodology with a data set of test items we knew to have been divulged to a subset of the examinees.

Download Full-text

The Effect of Varying Test Administration and Scoring Procedures on Three Tests of (Central) Auditory Processing Disorder

Journal of the American Academy of Audiology ◽

10.3766/jaaa.17063 ◽

2019 ◽

Vol 30 (08) ◽

pp. 694-702

Author(s):

Maria E. Pomponio ◽

Stephanie Nagle ◽

Jennifer L. Smart ◽

Shannon Palmer

Keyword(s):

Auditory Processing ◽

Repeated Measures ◽

Auditory Processing Disorder ◽

Test Administration ◽

Central Auditory Processing ◽

Test Length ◽

Scoring Methods ◽

Scoring Method ◽

Central Auditory Processing Disorder ◽

Test Items

AbstractThere is currently no widely accepted objective method used to identify (central) auditory processing disorder ([C]APD). Audiologists often rely on behavioral test methods to diagnose (C)APD, which can be highly subjective. This is problematic in light of relevant literature that has reported a lack of adequate graduate-level preparation related to (C)APD. This is further complicated when exacerbated by the use of inconsistent test procedures from those used to standardize tests of (C)APD, resulting in higher test variability. The consequences of modifying test administration and scoring methods for tests of (C)APD are not currently documented in the literature.This study aims to examine the effect of varying test administration and scoring procedures from those used to standardize tests of (C)APD on test outcome.This study used a repeated-measures design in which all participants were evaluated in all test conditions. The effects of varying the number of test items administered and the use of repetitions of missed test items on the test outcome score were assessed for the frequency patterns test (FPT), competing sentences test (CST), and the low-pass filtered speech test (LPFST). For the CST only, two scoring methods were used (a strict and a lax criterion) to determine whether or not scoring method affected test outcome.Thirty-three native English-speaking adults served as participants. All participants had normal hearing (as defined by thresholds of 25-dB HL or better) at all octave band frequencies from 500 to 4000 Hz, with thresholds of 55-dB HL or better at 8000 Hz. All participants had normal cognitive function as assessed by the Mini-Mental State Examination.Paired samples t-tests were used to evaluate the differences in test outcome when varying the CST scoring method. A 3 × 2 × 2 repeated-measures factorial analysis of variance (ANOVA) was used to determine the effects of test, length, and repetitions on outcome score for all three tests of auditory processing ability. Individual 2 × 2 repeated-measures two-way ANOVAs were subsequently conducted for each test to further evaluate interactions.There was no effect of scoring method on the CST outcome. There was a significant main effect of repetition use for the FPT and LPFST, in that test scores were greater when corrected for repetitions. An interaction between test length and repetitions was found for the LPFST only, such that there was a greater effect of repetition use when a shorter test was administered compared with a longer test.Test outcome may be affected when test administration procedures are varied from those used to standardize the test, lending itself to the broader possibility that the overall diagnosis of (C)APD may be subsequently affected.

Download Full-text

The option “none of these” improves multiple-choice test items

Journal of Dental Education ◽

10.1002/j.0022-0337.1991.55.2.tb02500.x ◽

1991 ◽

Vol 55 (2) ◽

pp. 161-163 ◽

Cited By ~ 1

Author(s):

RK Kolstad ◽

RA Kolstad

Keyword(s):

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test ◽

Test Items

Download Full-text

The Old Days Test: Scholastic Aptitude Test Items from the 1920s Revisited

Measurement and Evaluation in Guidance ◽

10.1080/00256307.1983.12022323 ◽

1983 ◽

Vol 15 (4) ◽

pp. 274-282

Author(s):

Thomas F. Donlon ◽

Nancy Breland

Keyword(s):

Scholastic Aptitude Test ◽

Aptitude Test ◽

Test Items ◽

Scholastic Aptitude

Download Full-text

Computer Applications for Test Administration and Scoring

Measurement and Evaluation in Counseling and Development ◽

10.1080/07481756.1986.12022814 ◽

1986 ◽

Vol 19 (1) ◽

pp. 6-14 ◽

Cited By ~ 4

Author(s):

David H. Madsen

Keyword(s):

Computer Applications ◽

Test Administration

Download Full-text

The Reliability of Retrieval-Induced Forgetting

European Psychologist ◽

10.1027/1016-9040/a000040 ◽

2012 ◽

Vol 17 (1) ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Rosalind Potts ◽

Robin Law ◽

John F. Golding ◽

David Groome

Keyword(s):

Individual Differences ◽

Recognition Test ◽

Laboratory Tests ◽

The Other ◽

Clinical Factors ◽

Test Items ◽

T1 And T2 ◽

Test Materials ◽

Induced Forgetting

Retrieval-induced forgetting (RIF) refers to the finding that the retrieval of an item from memory impairs the retrieval of related items. The extent to which this impairment is found in laboratory tests varies between individuals, and recent studies have reported an association between individual differences in the strength of the RIF effect and other cognitive and clinical factors. The present study investigated the reliability of these individual differences in the RIF effect. A RIF task was administered to the same individuals on two occasions (sessions T1 and T2), one week apart. For Experiments 1 and 2 the final retrieval test at each session made use of a category-cue procedure, whereas Experiment 3 employed category-plus-letter cues, and Experiment 4 used a recognition test. In Experiment 2 the same test items that were studied, practiced, and tested at T1 were also studied, practiced, and tested at T2, but for the remaining three experiments two different item sets were used at T1 and T2. A significant RIF effect was found in all four experiments. A significant correlation was found between RIF scores at T1 and T2 in Experiment 2, but for the other three experiments the correlations between RIF scores at T1 and T2 failed to reach significance. This study therefore failed to find clear evidence for reliable individual differences in RIF performance, except where the same test materials were used for both test sessions. These findings have important implications for studies involving individual differences in RIF performance.

Download Full-text