Development of Performance Evaluation Tests for Environmental Research (Peter): Arithmetic Computation

1980 ◽  
Vol 51 (3_suppl2) ◽  
pp. 1023-1031 ◽  
Author(s):  
D. M. Seales ◽  
R. S. Kennedy ◽  
A. C. Bittner

A paper-and-pencil test of simple arithmetic ability was exceptionally well suited for inclusion in a battery of Performance Evaluation Tests for Environmental Research (PETER). Mean performance stabilized after nine days of baseline testing. Variance was constant throughout 15 days of baseline testing. “Task definition” was high, and “differential stability” was present from the outset. Subjects apparently came to this test with well established differential levels of arithmetic ability.

1979 ◽  
Vol 23 (1) ◽  
pp. 508-512 ◽  
Author(s):  
D. M. Seales ◽  
R. S. Kennedy ◽  
A. C. Bittner

A paper-and-pencil test of simple arithmetic ability was found to be exceptionally well suited for inclusion in a battery of Performance Evaluation Tests for Environmental Research (PETER). Mean performance stabilized after nine days of baseline testing. Variance was constant throughout fifteen days of baseline testing. “Task definition” was high, and “differential stability” was present from the outset. Subjects apparently came to this test with well established differential levels of arithmetic ability.


1986 ◽  
Vol 63 (2) ◽  
pp. 683-708 ◽  
Author(s):  
Alvah C. Bittner ◽  
Robert C. Carter ◽  
Robert S. Kennedy ◽  
Mary M. Harbeson ◽  
Michele Krause

The goal of the Performance Evaluation Tests for Environmental Research (PETER) Program was to identify a set of measures of human capabilities for use in the study of environmental and other time-course effects. 114 measures studied in the PETER Program were evaluated and categorized into four groups based upon task stability and task definition. The Recommended category contained 30 measures that clearly obtained total stabilization and had an acceptable level of reliability efficiency. The Acceptable-But-Redundant category contained 15 measures. The 37 measures in the Marginal category, which included an inordinate number of slope and other derived measures, usually had desirable features which were outweighed by faults. The 32 measures in the Unacceptable category had either differential instability or weak reliability efficiency. It is our opinion that the 30 measures in the Recommended category should be given first consideration for environmental research applications. Further, it is recommended that information pertaining to preexperimental practice requirements and stabilized reliabilities should be utilized in repeated-measures environmental studies.


1984 ◽  
Vol 58 (2) ◽  
pp. 567-573 ◽  
Author(s):  
Diane L. Damos ◽  
Alvah C. Bittner ◽  
Robert S. Kennedy ◽  
Mary M. Harbeson ◽  
Michele Krause

A critical tracking test was considered for inclusion in the Performance Evaluation Tests for Environmental Research (PETER) Battery which was designed for use in unusual environments. Baseline measures were obtained by testing 18 subjects for 14 consecutive days. Mean performances increased but standard deviations were constant over the 14 days. Test-retest reliabilities improved over the first 8 days after which differential stability was seen. The implications for the use of this test in exotic environments are discussed. The critical tracking test is recommended as a good candidate for environmental research when practiced to total stability.


1982 ◽  
Vol 51 (2) ◽  
pp. 635-644 ◽  
Author(s):  
S. L. Mackaman ◽  
A. C. Bittner ◽  
M. M. Harbeson ◽  
R. S. Kennedy ◽  
D. A. Stone

To ascertain the suitability of the Wonderlic Personnel Test for inclusion in a battery of Performance Evaluation Tests for Environmental Research (PETER) parallel forms were administered daily, without coaching or feedback, for 19 consecutive work days to 13 Navy enlisted men who were high school graduates. Over Days 1 to 10 and 18 to 19, unique forms were administered; forms were repeated over Days 11 to 17. The mean score significantly increased from about 23 to 29 amounting to 0.7 standard score units. Subsequent to Day 4, the change in performance was linear and accounted for 57% of the Days 5 to 19 variation. The standard deviations were homogeneous over all repeated and unrepeated days, and the reliability correlations were differentially stable across all days, with a task definition of r = .70. The group mean increase of more than 21 percentile points on the Wonderlic has implications for selection and counseling. It is noteworthy that the average subject in our group scored equal to “stenographer” or “draftsman” on the first occasion but typical of “engineer” or “accountant” on the last. It was concluded that the Wonderlic is suitable for inclusion in PETER.


1979 ◽  
Vol 23 (1) ◽  
pp. 536-540 ◽  
Author(s):  
Marshall B. Jones

Most tasks show practice effects with repeated administrations, effects that may appear in the group mean, the variance among subjects, or the correlations over subjects among trials or repeated testings. Fortunately, there comes a point in many tasks after which practice no longer produces changes in performance; as we will put it, the task stabilizes. Stabilization in this sense is a key phenomenon for performance testing, the prediction of individual behavior, and the theory of personality. It is also desirable that a task be well-defined, that is, that the average correlation among stabilized trials be high (greater than .80). The paper focuses on differential stability, that is, constancy in the positions of individual subjects relative to one another from one trial to the next. Instability or differential change over a set of consecutive trials may appear either within that set of trials (local change) or between the set and other tasks or preceding trials on the sane task (general change). Of the two forms of differential stability or change the latter, general change, is much the more important. The paper concludes with a brief summary of stabilization and task definition in ten tasks currently under consideration for inclusion in a performance test battery for environmental research.


1984 ◽  
Vol 28 (1) ◽  
pp. 11-15 ◽  
Author(s):  
A. C. Bittner ◽  
R. C. Carter ◽  
R. S. Kennedy ◽  
M. M. Harbeson ◽  
M. Krause

The goal of the Performance Evaluation Tests for Environmental Research (PETER) Program was to identify a set of measures of human cognitive, perceptual, and motor capabilities for use in the study of environmental and other time-course effects. Tasks were evaluated as suitable for repeated measures applications when their intertrial means, variances and correlations were well-behaved under constant baseline conditions. This report provides an evaluation of 112 test measures studied in the program. They are categorized into four groups based upon joint consideration of task stability and task definition. Thirty test measures were categorized as Good, 15 as Good-But-Redundant, 35 as Ugly (flawed), and 32 as Bad.


1980 ◽  
Vol 24 (1) ◽  
pp. 330-334 ◽  
Author(s):  
Denise B. McCafferty ◽  
Alvah C. Bittner ◽  
Robert C. Carter

Auditory digit span was evaluated as an instrument for repeated measurements experimentation. Twelve subjects were tested for one hour on each of 12 consecutive workdays in a standard environment. Both forward and backward digit span were measured. It was found that forward digit span was suitable for repeated measures after ten days of practice at 30 minutes per day. The criteria for suitability were predictability of the mean scores, constancy of the standard deviations and differential stability of the intertrial correlations. These criteria are sufficient conditions both for repeated measures Analysis of Variance, and for interpretation of experimental effects. Although the backward digit span scores did not meet these criteria, they became more and more correlated with the forward digit span scores as the experiment progressed. This indicates that the mental content of the two tests of memory converged with practice. One implication of this finding is to question the meaningfulness of factor structure after only limited practice. The forward auditory digit span test was recommended for inclusion in a battery of Performance Evaluation Tests for Environmental Research (PETER).


1980 ◽  
Vol 24 (1) ◽  
pp. 344-348 ◽  
Author(s):  
Robert S. Kennedy ◽  
Robert C. Carter ◽  
Alvah C. Bittner

Performance Evaluation Tests for Environmental Research (PETER) are under development at the Naval Biodynamics Laboratory and supporting organizations. The tests, or tasks, studied in this program have been largely derived from the literature. Each task was evaluated for suitability for repeated measures experimental designs which are almost universally used in environmental research. Suitability criteria included the “stability” of task means, standard deviations, and between-trial correlations. The magnitude of the “stabilized” between-trial correlations, task definition, was also examined with respect to the administration time. There are 60 active tasks in the present program. All tasks examined to date exhibit stable means and variances after adequate practice but: (a) less than 30% meet minimal stability criteria for intertrial correlations; and (b) substantial practice (typically more than an hour over five days) is required to achieve stability. A tabular catalogue of the research findings and background for 15 tasks is presented and discussed.


1981 ◽  
Author(s):  
Ross L. Pepper ◽  
Robert S. Kennedy ◽  
Alvah C. Bittner ◽  
Steven F. Wiker

Sign in / Sign up

Export Citation Format

Share Document