Influence of Response-Option Combinations when Measuring Sense of Efficacy for Teaching: Trivial, or Substantial and Substantive?

Certain combinations of number and labeling of response options on Likert scales might, because of their interaction, influence psychometric outcomes. In order to explore this possibility with an experimental design, two versions of a scale for assessing sense of efficacy for teaching (SET) were administered to preservice teachers. One version had seven response options with labels at odd-numbered points; the other had nine response options with labels only at the extremes. Before outliers in the data were adjusted, the first version produced a range of more desirable psychometric outcomes but poorer test–retest reliability. After outliers were addressed, the second version had more undesirable attributes than before, and its previously high test–retest reliability dropped to poor. These results are discussed in relation to the design of scales for assessing SET and other constructs as well as in relation to the need for researchers to examine their data carefully, consider the need to address outlying data, and conduct analyses appropriately and transparently.

Download Full-text

Validation of a menstrual pictogram and a daily bleeding diary for assessment of uterine fibroid treatment efficacy in clinical studies

Journal of Patient-Reported Outcomes ◽

10.1186/s41687-020-00263-0 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Claudia Haberland ◽

Anna Filonenko ◽

Christian Seitz ◽

Matthias Börner ◽

Christoph Gerlinger ◽

...

Keyword(s):

Uterine Fibroid ◽

Full Range ◽

Intraclass Correlation ◽

Phase Iii ◽

Measurement Properties ◽

Retest Reliability ◽

Response Options ◽

Patient Global Impression ◽

Patient Reported ◽

Test Retest Reliability

Abstract Background To evaluate the psychometric and measurement properties of two patient-reported outcome instruments, the menstrual pictogram superabsorbent polymer-containing version 3 (MP SAP-c v3) and Uterine Fibroid Daily Bleeding Diary (UF-DBD). Test-retest reliability, criterion, construct validity, responsiveness, missingness and comparability of the MP SAP-c v3 and UF-DBD versus the alkaline hematin (AH) method and a patient global impression of severity (PGI-S) were analyzed in post hoc trial analyses. Results Analyses were based on data from up to 756 patients. The full range of MP SAP-c v3 and UF-DBD response options were used, with score distributions reflecting the cyclic character of the disease. Test-retest reliability of MP SAP-c v3 and UF-DBD scores was supported by acceptable intraclass correlation coefficients when stability was defined by the AH method and Patient Global Impression of Severity (PGI-S) scores (0.80–0.96 and 0.42–0.94, respectively). MP SAP-c v3 and UF-DBD scores demonstrated strong and moderate-to-strong correlations with menstrual blood loss assessed by the AH method. Scores increased in monotonic fashion, with greater disease severities, defined by the AH method and PGI-S scores; differences between groups were mostly statistically significant (P < 0.05). MP SAP-c v3 and UF-DBD were sensitive to changes in disease severity, defined by the AH method and PGI-S. MP SAP-c v3 and UF-DBD showed a lower frequency of missing patient data versus the AH method, and good agreement with the AH method. Conclusions This evidence supports the use of the MP SAP-c v3 and UF-DBD to assess clinical efficacy endpoints in UF phase III studies replacing the AH method.

Download Full-text

Healthy decisions in the cued-attribute food choice paradigm have high test-retest reliability

Scientific Reports ◽

10.1038/s41598-021-91933-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Zahra Barakchian ◽

Anjali Raja Beharelle ◽

Todd A. Hare

Keyword(s):

Food Choice ◽

Food Choices ◽

Retest Reliability ◽

Individual Level ◽

Food Items ◽

Intervention Efficacy ◽

The Individual ◽

Test Retest Reliability ◽

Decision Mechanisms ◽

High Test

AbstractFood choice paradigms are commonly used to study decision mechanisms, individual differences, and intervention efficacy. Here, we measured behavior from twenty-three healthy young adults who completed five repetitions of a cued-attribute food choice paradigm over two weeks. This task includes cues prompting participants to explicitly consider the healthiness of the food items before making a selection, or to choose naturally based on whatever freely comes to mind. We found that the average patterns of food choices following both cue types and ratings about the palatability (i.e. taste) and healthiness of the food items were similar across all five repetitions. At the individual level, the test-retest reliability for choices in both conditions and healthiness ratings was excellent. However, test-retest reliability for taste ratings was only fair, suggesting that estimates about palatability may vary more from day to day for the same individual.

Download Full-text

The Development of a Paediatric Phoneme Discrimination Test for Arabic Phonemic Contrasts

Audiology Research ◽

10.3390/audiolres11020014 ◽

2021 ◽

Vol 11 (2) ◽

pp. 150-166

Author(s):

Hanin Rayes ◽

Ghada Al-Malky ◽

Deborah Vickers

Keyword(s):

Closed Set ◽

Retest Reliability ◽

Response Options ◽

Phoneme Discrimination ◽

Significant Difference ◽

Age Appropriate ◽

Hearing Children ◽

Test Retest Reliability ◽

Development And Validation ◽

Arabic Speaking

Objective: The aim of this project was to develop the Arabic CAPT (A-CAPT), a Standard Arabic version of the CHEAR auditory perception test (CAPT) that assesses consonant perception ability in children. Method: This closed-set test was evaluated with normal-hearing children aged 5 to 11 years. Development and validation of the speech materials were accomplished in two experimental phases. Twenty-six children participated in phase I, where the test materials were piloted to ensure that the selected words were age appropriate and that the form of Arabic used was familiar to the children. Sixteen children participated in phase II where test–retest reliability, age effects, and critical differences were measured. A computerized implementation was used to present stimuli and collect responses. Children selected one of four response options displayed on a screen for each trial. Results: Two lists of 32 words were developed with two levels of difficulty, easy and hard. Assessment of test–retest reliability for the final version of the lists showed a strong agreement. A within-subject ANOVA showed no significant difference between test and retest sessions. Performance improved with increasing age. Critical difference values were similar to the British English version of the CAPT. Conclusions: The A-CAPT is an appropriate speech perception test for assessing Arabic-speaking children as young as 5 years old. This test can reliably assess consonant perception ability and monitor changes over time or after an intervention.

Download Full-text

Cardio-ventilatory coupling in young healthy resting subjects

Journal of Applied Physiology ◽

10.1152/japplphysiol.01424.2010 ◽

2012 ◽

Vol 112 (8) ◽

pp. 1248-1257 ◽

Cited By ~ 11

Author(s):

Lee Friedman ◽

Thomas E. Dick ◽

Frank J. Jacono ◽

Kenneth A. Loparo ◽

Amir Yeganeh ◽

...

Keyword(s):

Body Surface ◽

Relative Importance ◽

Statistical Relationship ◽

Sinus Arrhythmia ◽

Retest Reliability ◽

Respiratory Events ◽

Before And After ◽

Test Retest Reliability ◽

Degree Of Coupling ◽

High Test

In this work, cardio-ventilatory coupling (CVC) refers to the statistical relationship between the onset of either inspiration (I) or expiration (E) and the timing of heartbeats (R-waves) before and after these respiratory events. CVC was assessed in healthy, young (<45 yr), resting, supine subjects ( n = 19). Four intervals were analyzed: time from I-onset to both the prior R-wave (R-to-I) and the following R-wave (I-to-R), as well as time from E-onset to both the prior R-wave (R-to-E) and following R-wave (E-to-R). The degree of coupling was quantified in terms of transformed relative Shannon entropy (tRSE), and χ2 tests based on histograms of interval times from 200 breaths. Subjects were studied twice, from 5 to 27 days apart, and the test-retest reliability of CVC measures was computed. Several factors pointed to the relative importance of the R-to-I interval compared with other intervals. Coupling was significantly stronger for the R-to-I interval, coupling reliability was largest for the R-to-I interval, and only tRSE for the R-to-I interval was correlated with height, weight, and body surface area. The high test-retest reliability for CVC in the R-to-I interval provides support for the hypothesis that CVC strength is a subject trait. Across subjects, a peak ∼138 ms prior to I-onset was characteristic of CVC in the R-to-I interval, although individual subjects also had earlier peaks (longer R-to-I intervals). CVC for the R-to-I interval was unrelated to two separate measures of respiratory sinus arrhythmia (RSA), suggesting that these two forms of coupling (CVC and RSA) are independent.

Download Full-text

ShareDisk: A novel visual tool to assess perceptions about who should be responsible for supporting persons with mental health problems

International Journal of Social Psychiatry ◽

10.1177/0020764020913580 ◽

2020 ◽

Vol 66 (4) ◽

pp. 411-418

Author(s):

Srividya N Iyer ◽

Megan A Pope ◽

Gerald Jordan ◽

Greeshma Mohan ◽

Heleen Loohuis ◽

...

Keyword(s):

Mental Health ◽

Psychometric Properties ◽

Clinical Utility ◽

Mental Health Problems ◽

Health Problems ◽

Language And Literacy ◽

Retest Reliability ◽

Visual Tool ◽

Test Retest Reliability ◽

High Test

Objectives: Views on who bears how much responsibility for supporting individuals with mental health problems may vary across stakeholders (patients, families, clinicians) and cultures. Perceptions about responsibility may influence the extent to which stakeholders get involved in treatment. Our objective was to report on the development, psychometric properties and usability of a first-ever tool of this construct. Methods: We created a visual weighting disk called ‘ShareDisk’, measuring perceived extent of responsibility for supporting persons with mental health problems. It was administered (twice, 2 weeks apart) to patients, family members and clinicians in Chennai, India ( N = 30, 30 and 15, respectively) and Montreal, Canada ( N = 30, 32 and 15, respectively). Feedback regarding its usability was also collected. Results: The English, French and Tamil versions of the ShareDisk demonstrated high test–retest reliability ( rs = .69–.98) and were deemed easy to understand and use. Conclusion: The ShareDisk is a promising measure of a hitherto unmeasured construct that is easily deployable in settings varying in language and literacy levels. Its clinical utility lies in clarifying stakeholder roles. It can help researchers investigate how stakeholders’ roles are perceived and how these perceptions may be shaped by and shape the organization and experience of healthcare across settings.

Download Full-text

A Reliability Analysis of the Revised Competitiveness Index

Psychological Reports ◽

10.2466/pr0.106.3.870-874 ◽

2010 ◽

Vol 106 (3) ◽

pp. 870-874 ◽

Cited By ~ 35

Author(s):

Paul B. Harris ◽

John M. Houston

Keyword(s):

Reliability Analysis ◽

Factor Structure ◽

Dynamic State ◽

Retest Reliability ◽

Stable Factor ◽

Competitiveness Index ◽

Test Retest Reliability ◽

High Test

This study examined the reliability of the Revised Competitiveness Index by investigating the test-retest reliability, interitem reliability, and factor structure of the measure based on a sample of 280 undergraduates (200 women, 80 men) ranging in age from 18 to 28 years ( M = 20.1, SD = 2.1). The findings indicate that the Revised Competitiveness Index has high test-retest reliability, high interitem reliability, and a stable factor structure. The results support the assertion that the Revised Competitiveness Index assesses competitiveness as a stable trait rather than a dynamic state.

Download Full-text

High test-retest reliability of a neural index of rapid automatic discrimination of unfamiliar individual faces

Journal of Vision ◽

10.1167/19.10.136c ◽

2019 ◽

Vol 19 (10) ◽

pp. 136c

Author(s):

Milena Dzhelyova ◽

Giulia Dormal ◽

Corentin Jacques ◽

Caroline Michel ◽

Christine Schiltz ◽

...

Keyword(s):

Retest Reliability ◽

Test Retest Reliability ◽

High Test

Download Full-text

Development and Approbation of Methods for Diagnostics of Predisposition to Monosemantic or Polysemantic Context Generation

Psychological Science and Education ◽

10.17759/pse.2019240309 ◽

2019 ◽

Vol 24 (3) ◽

pp. 95-107

Author(s):

N.A. Khokhlov ◽

G.D. Laskov

Keyword(s):

Personality Traits ◽

Functional Asymmetry ◽

The Other ◽

Cronbach’S Alpha ◽

Common Variance ◽

Retest Reliability ◽

Cronbach's Alpha ◽

The Common ◽

Test Retest Reliability

This article focuses on the development of methods to measure personality and cognitive predisposition to monosemantic or polysemantic context generation (PCG).In accordance with the concept of V.S. Rotenberg, we assumed that PCG was connected with manual functional asymmetry. We developed four tests: one was designed to measure personality PCG, the other three measure cognitive PCG. Approbation samples consisted of 160—736 participants. Cronbach's alpha (0.67—0.93) and split-half coefficient (0.72—0.93) were calculated for all tests, for two of them test-retest reliability (0.47—0.91) was measured. Variance of personal PCG on 21.7% is explained by the variance of personality traits “reticence-sociability” and “concreteness-abstractness”. Personality and cognitive PCG are interconnected, but they have a fair amount of specificity. Manual functional asymmetry is weakly connected with personal PCG (not more than 1.5% of the common variance) and is not connected with cognitive PCG

Download Full-text

A systematic comparison and reliability analysis of formal measures of sentence acceptability

10.31234/osf.io/vrfxn ◽

2019 ◽

Author(s):

Steven Langsford ◽

Andrew T Hendrickson ◽

Amy Perfors ◽

Lauren Kennedy ◽

Danielle Navarro

Keyword(s):

High Reliability ◽

Significant Degree ◽

Response Styles ◽

The Other ◽

Likert Scales ◽

Wide Range ◽

Acceptability Judgments ◽

Test Retest Reliability ◽

Item Effects ◽

Do So

Understanding and measuring sentence acceptability is of fundamental importance for linguists, but although many measures for doing so have been developed, relatively little is known about some of their psychometric properties. In this paper we evaluate within- and between-participant test-retest reliability on a wide range of measures of sentence acceptability. Doing so allows us to estimate how much of the variability within each measure is due to factors including participant-level individual differences, sample size, response styles, and item effects. The measures examined include Likert scales, two versions of forced-choice judgments, magnitude estimation, and a novel measure based on Thurstonian approaches in psychophysics. We reproduce previous findings of high between-participant reliability within and across measures, and extend these results to a generally high reliability within individual items and individual people. Our results indicate that Likert scales and the Thurstonian approach produce the most stable and reliable acceptability measures and do so with smaller sample sizes than the other measures. Moreover, their agreement with each other suggests that the limitation of a discrete Likert scale does not impose a significant degree of structure on the resulting acceptability judgments.

Download Full-text

Imagery, Concreteness, Emotionality, Meaningfulness, and Pleasantness of Words

Perceptual and Motor Skills ◽

10.2466/pms.1995.80.3.867 ◽

1995 ◽

Vol 80 (3) ◽

pp. 867-880 ◽

Cited By ~ 9

Author(s):

Alfredo Campos

Keyword(s):

Gender Differences ◽

Retest Reliability ◽

Test Retest Reliability ◽

High Test

This account of the literature on the relationships among imagery, concreteness, emotionality, meaningfulness, and pleasantness shows high test-retest reliability for all five attributes which are stable for subjects of both genders and of several nationalities. Gender differences and the influence of attributes on other attributes are also examined.

Download Full-text