rating session
Recently Published Documents

There are indications that the pupillary dilation response (PDR) reflects surprising moments in an auditory sequence such as the appearance of a deviant noise against repetitively presented pure tones (Liao, Yoneya, Kidani, Kashino, & Furukawa, 2016), and salient and loud sounds that are evaluated by human participants subjectively (Liao, Kidani, Yoneya, Kashino, & Furukawa, 2016). In the current study, we further examined whether the reflection of PDR in auditory surprise can be accumulated and revealed in complex and yet structured auditory stimuli, i.e., music, and when the surprise is defined subjectively. Participants listened to 15 excerpts of music while their pupillary responses were recorded. In the surprise-rating session, participants rated how surprising an instance in the excerpt was, i.e., rich in variation versus monotonous, while they listened to it. In the passive-listening session, they listened to the same 15 excerpts again but were not involved in any task. The pupil diameter data obtained from both sessions were time-aligned to the rating data obtained from the surprise-rating session. Results showed that in both sessions, mean pupil diameter was larger at moments rated more surprising than unsurprising. The result suggests that the PDR reflects surprise in music automatically.

Download Full-text

Working with sparse data in rated language tests: Generalizability theory applications

Language Testing ◽

10.1177/0265532216638890 ◽

2016 ◽

Vol 34 (2) ◽

pp. 271-289 ◽

Cited By ~ 2

Author(s):

Chih-Kai Lin

Keyword(s):

Large Scale ◽

Generalizability Theory ◽

Operational Performance ◽

Speaking Proficiency ◽

Language Tests ◽

Reliability Estimates ◽

Score Reliability ◽

Rating Method ◽

English Speaking ◽

Rating Session

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the technical complexity involved in estimating score reliability from sparse-rated data. Examining the estimation precision of reliability is of great importance because the utility of any performance-based language test depends on its reliability. Results suggest that when some raters are expected to have greater score variability than other raters (e.g., a mixture of novice and experienced raters being deployed in a rating session), the sub-dividing method is recommended as it yields more precise reliability estimates. When all raters are expected to exhibit similar variability in their scoring, both the rating and sub-dividing methods are equally precise in estimating score reliability, and the rating method is recommended for operational use, as it is easier to implement in practice. Informed by these methodological results, the current study also demonstrates a step-by-step analysis for investigating the score reliability from sparse-rated data taken from a large-scale English speaking proficiency test. Implications for operational performance-based language tests are discussed.

Download Full-text

Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

PeerJ ◽

10.7717/peerj.1881 ◽

2016 ◽

Vol 4 ◽

pp. e1881 ◽

Cited By ~ 7

Author(s):

Chris Beardsley ◽

Tim Egerton ◽

Brendon Skinner

Keyword(s):

Pelvic Tilt ◽

Sagittal Plane ◽

Test Reliability ◽

Rater Reliability ◽

Males And Females ◽

The Right ◽

Rating Session ◽

Healthy Males

Objective.The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females.Method.The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart.Results.For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81–0.88), test–re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88–0.95), and test–re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85).Conclusion.Inter-rater reliability and test–re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test–re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test–re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

Download Full-text

rating sessionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Pupillary dilation response reflects surprising moments in music

Working with sparse data in rated language tests: Generalizability theory applications

Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

rating session
Recently Published Documents