Test-retest reliability of the HEXACO-100—And the value of multiple measurements for assessing reliability

Despite the widespread use of the HEXACO model as a descriptive taxonomy of personality traits, there remains limited information on the test-retest reliability of its commonly-used inventories. Studies typically report internal consistency estimates, such as alpha or omega, but there are good reasons to believe that these do not accurately assess reliability. We report 13-day test-retest correlations of the 100- and 60-item English HEXACO Personality Inventory-Revised (HEXACO-100 and HEXACO-60) domains, facets, and items. In order to test the validity of test-retest reliability, we then compare these estimates to correlations between self- and informant-reports (i.e., cross-rater agreement), a widely-used validity criterion. Median estimates of test-retest reliability were .88, .81, and .65 (N = 416) for domains, facets, and items, respectively. Facets’ and items’ test-retest reliabilities were highly correlated with their cross-rater agreement estimates, whereas internal consistencies were not. Overall, the HEXACO Personality Inventory-Revised demonstrates test-retest reliability similar to other contemporary measures. We recommend that short-term retest reliability should be routinely calculated to assess reliability.

Download Full-text

Test-Retest Reliability of the HEXACO-PI-R

10.31234/osf.io/rvpxa ◽

2021 ◽

Author(s):

Samuel Henry ◽

Isabel Thielmann ◽

Tom Booth ◽

René Mõttus

Keyword(s):

Internal Consistency ◽

Limited Information ◽

Rater Agreement ◽

Retest Reliability ◽

Human Personality ◽

Six Traits ◽

Highly Correlated ◽

Test Retest Reliability

Despite the widespread use of the HEXACO as a descriptive taxonomy of human personality, there remains very limited information on the test-retest reliability of commonly used tools to measure the six traits. We report 12-day test-retest of the 100-item HEXACO-PI-R (HEXACO-100) at the level of domains, facets and items. We compare test-retest estimates to internal consistency for domains and facets, and to cross-rater agreement for all levels of measurement. Median rTTs were r = .65, .81, and .88 (n = 416) for items, facets, and domains, respectively. Facets’ rCAs were highly correlated with rTTs but not s. We conclude that the HEXACO-100 demonstrates rTT similar to other contemporary measures, and that rTT data should be routinely collected for scales.

Download Full-text

Test-Retest Reliability of a Self-Report Questionnaire for DSM-IV and ICD-10 Personality Disorders

European Journal of Psychological Assessment ◽

10.1027//1015-5759.16.1.53 ◽

2000 ◽

Vol 16 (1) ◽

pp. 53-58 ◽

Cited By ~ 11

Author(s):

Hans Ottosson ◽

Martin Grann ◽

Gunnar Kullgren

Keyword(s):

Personality Disorder ◽

Anxiety Disorders ◽

Personality Disorders ◽

Clinical Sample ◽

Self Report ◽

Anxiety State ◽

Short Term ◽

Retest Reliability ◽

Axis I ◽

Test Retest Reliability

Summary: Short-term stability or test-retest reliability of self-reported personality traits is likely to be biased if the respondent is affected by a depressive or anxiety state. However, in some studies, DSM-oriented self-reported instruments have proved to be reasonably stable in the short term, regardless of co-occurring depressive or anxiety disorders. In the present study, we examined the short-term test-retest reliability of a new self-report questionnaire for personality disorder diagnosis (DIP-Q) on a clinical sample of 30 individuals, having either a depressive, an anxiety, or no axis-I disorder. Test-retest scorings from subjects with depressive disorders were mostly unstable, with a significant change in fulfilled criteria between entry and retest for three out of ten personality disorders: borderline, avoidant and obsessive-compulsive personality disorder. Scorings from subjects with anxiety disorders were unstable only for cluster C and dependent personality disorder items. In the absence of co-morbid depressive or anxiety disorders, mean dimensional scores of DIP-Q showed no significant differences between entry and retest. Overall, the effect from state on trait scorings was moderate, and it is concluded that test-retest reliability for DIP-Q is acceptable.

Download Full-text

Short-Term Test-Retest Reliability of an Experimental Version of the Basic Attributes Test Battery

10.21236/ada237484 ◽

1991 ◽

Author(s):

Thomas R. Carretta

Keyword(s):

Test Battery ◽

Short Term ◽

Retest Reliability ◽

Short Term Test ◽

Test Retest Reliability

Download Full-text

Short-Term Test–Retest Reliability of Contralateral Suppression of Click-Evoked Otoacoustic Emissions in Normal-Hearing Subjects

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00393 ◽

2021 ◽

pp. 1-11

Author(s):

Hannah Keppler ◽

Sofie Degeest ◽

Bart Vinck

Keyword(s):

Repeated Measures ◽

Otoacoustic Emissions ◽

Otoacoustic Emission ◽

Normal Hearing ◽

Short Term ◽

Retest Reliability ◽

Reliability Parameters ◽

Contralateral Suppression ◽

Short Term Test ◽

Test Retest Reliability

Purpose The objective of the current study was to investigate the short-term test–retest reliability of contralateral suppression (CS) of click-evoked otoacoustic emissions (CEOAEs) using commercially available otoacoustic emission equipment. Method Twenty-three young normal-hearing subjects were tested. An otoscopic evaluation, admittance measures, pure-tone audiometry, measurements of CEOAEs without and with contralateral acoustic stimulation (CAS) to determine CS were performed at baseline ( n = 23), an immediate retest without and with refitting of the probe (only CS of CEOAEs; n = 11), and a retest after 1 week ( n = 23) were performed. Test–retest reliability parameters were determined on CEOAE response amplitudes without and with CAS, and on raw and normalized CS indices between baseline and the other test moments. Results Repeated-measures analysis of variance indicated no random or systematic changes in CEOAE response amplitudes without and with CAS, and in raw and normalized CS indices between the test moments. Moderate-to-high intraclass correlation coefficients with mostly high significant between-subjects variability between baseline and each consecutive test moment were found for CEOAE response amplitude without and with CAS, and for the raw and normalized CS indices. Other reliability parameters deteriorated between CEOAE response amplitudes with CAS as compared to without CAS, between baseline and retest with probe refitting, and after 1 week, as well as for frequency-specific raw and normalized CS indices as compared to global CS indices. Conclusions There was considerable variability in raw and normalized CS indices as measured using CEOAEs with CAS using commercially available otoacoustic emission equipment. More research is needed to optimize the measurement of CS of CEOAEs and to reduce influencing factors, as well as to make generalization of test–retest reliability data possible.

Download Full-text

Short-term test–retest reliability and continuity of emotional availability in parent–child dyads

International Journal of Behavioral Development ◽

10.1177/0165025419830256 ◽

2019 ◽

Vol 43 (3) ◽

pp. 271-277

Author(s):

Joyce J. Endendijk ◽

Marleen G. Groeneveld ◽

Maja Deković ◽

Carlijn van den Boomen

Keyword(s):

Emotional Availability ◽

Free Play ◽

Book Reading ◽

Short Term ◽

Retest Reliability ◽

Child Involvement ◽

Parenting Dimensions ◽

Test Retest Reliability ◽

Parent Child

The emotional availability scales (EAS), 4th edition, are widely used in research and clinical practice to assess the quality of parent–child interaction. This study examined the short-term reliability and continuity of the EAS (4th ed.) assessed in two similar observational contexts over a one-week interval. Sixty-two Dutch parents (85% mothers) and their 9- to 12-month-old infants ( Mage = 10.07 months, SD = 0.47, 53% boys) were videotaped twice while they interacted with each other during several tasks (free play, structured play, book reading, toys taken away). The videotapes were coded with the EAS 4th edition by two reliable coders. Moderate to strong test–retest reliability was found for the three EA parent-dimensions: sensitivity, structuring, and nonintrusiveness. Child involvement was not reliable over a one-week period, and child responsiveness could only be reliably assessed in boys. Test–retest reliability of structuring was also higher for boys than for girls. Regarding continuity, mean levels of sensitivity, structuring, nonintrusiveness, and involvement did not change over a one-week interval, but responsiveness increased for girls only. Thus, the parenting dimensions of the 4th edition of the EAS reflect stable and consistent characteristics of the parent–child dyad on the short term, but the child measures do not.

Download Full-text

Short-term test–retest reliability of the ImPACT in healthy young athletes

Applied Neuropsychology Child ◽

10.1080/21622965.2017.1290529 ◽

2017 ◽

Vol 7 (3) ◽

pp. 208-216 ◽

Cited By ~ 1

Author(s):

Amanda M. O’Brien ◽

Joseph E. Casey ◽

Rachel M. Salmon

Keyword(s):

Young Athletes ◽

Short Term ◽

Retest Reliability ◽

Short Term Test ◽

Test Retest Reliability ◽

The Impact

Download Full-text

Testing Nursing Competence: Validity and Reliability of the Nursing Performance Profile

Journal of Nursing Measurement ◽

10.1891/1061-3749.25.3.431 ◽

2017 ◽

Vol 25 (3) ◽

pp. 431-458 ◽

Cited By ~ 2

Author(s):

Janine E. Hinton ◽

Mary Z. Mays ◽

Debra Hagler ◽

Pamela Randolph ◽

Ruth Brooks ◽

...

Keyword(s):

High Fidelity ◽

High Fidelity Simulation ◽

Rater Agreement ◽

Validity And Reliability ◽

Retest Reliability ◽

Nursing Competence ◽

Rating Instrument ◽

Simulation Testing ◽

Self Assessments ◽

Test Retest Reliability

Background and Purpose: There is growing evidence that simulation testing is appropriate for assessing nursing competence. We compiled evidence on the validity and reliability of the Nursing Performance Profile (NPP) method for assessing competence. Methods: Participants (N = 67) each completed 3 high-fidelity simulation tests; raters (N = 31) scored the videotaped tests using a 41-item competency rating instrument. Results: The test identified areas of practice breakdown and distinguished among subgroups differing in age, education, and simulation experience. Supervisor assessments were positively correlated, r = .31. Self-assessments were uncorrelated, r = .07. Inter-rater agreement ranged from 93% to 100%. Test–retest reliability ranged from r = .57 to .69. Conclusions: The NPP can be used to assess competence and make decisions supporting public safety.

Download Full-text

Short-term test-retest-reliability of conditioned pain modulation using the cold-heat-pain method in healthy subjects and its correlation to parameters of standardized quantitative sensory testing

BMC Neurology ◽

10.1186/s12883-016-0650-z ◽

2016 ◽

Vol 16 (1) ◽

Cited By ~ 19

Author(s):

Julia Gehling ◽

Tina Mainka ◽

Jan Vollert ◽

Esther M. Pogatzki-Zahn ◽

Christoph Maier ◽

...

Keyword(s):

Healthy Subjects ◽

Quantitative Sensory Testing ◽

Pain Modulation ◽

Conditioned Pain Modulation ◽

Heat Pain ◽

Sensory Testing ◽

Short Term ◽

Retest Reliability ◽

Short Term Test ◽

Test Retest Reliability

Download Full-text

Validity and reliability of a clinical non-exercise method for assessment of cardiorespiratory fitness using seismocardiography

European Heart Journal ◽

10.1093/eurheartj/ehab724.3172 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

M Thunestvedt Hansen ◽

T Roemer ◽

A Hoejgaard ◽

K Husted ◽

K Soerensen ◽

...

Keyword(s):

Cardiorespiratory Fitness ◽

Exercise Test ◽

Indirect Calorimetry ◽

Recording Device ◽

Private Company ◽

Retest Reliability ◽

Graded Exercise Test ◽

Graded Exercise ◽

Highly Correlated ◽

Test Retest Reliability

Abstract Introduction Low cardiorespiratory fitness expressed as a low maximal oxygen consumption (V̇O2max) is associated with cardiovascular disease and all-cause mortality (1). Thus, V̇O2max is recognised as an important clinical tool in the assessment of patients (1,2). However, assessment of V̇O2max by exercise testing is both physically demanding and methodologically challenging and hence the clinical applicability is limited. Purpose Therefore, the aim of this study was to investigate the accuracy and precision of a clinical non-exercise method for assessment of V̇O2max. Methods On three separate days 20 healthy men (n=10) and women (n=10) with varying age (22–72 years) and fitness levels performed two tests for determination of V̇O2max; (a) a non-exercise test using seismocardiography (SCG V̇O2max) and (b) a graded exercise test to voluntary exhaustion on a cycle ergometer based on indirect calorimetry (IC V̇O2max). These tests were performed in order to examine the day-to-day reliability and the validity of SCG V̇O2max, respectively. Furthermore, SCG V̇O2max was assessed twice on each test day to investigate test-retest reliability. The SCG V̇O2max was performed in prone position following a short resting period by placing the SCG recording device on the xiphisternal joint with double adhesive tape. V̇O2max was assessed during a 5-minute recording of the sternal movement using SCG in combination with demographic data of the participants (3). In addition, body composition was measured and a resting blood sample collected each test day. Results On average SCG V̇O2max was 3.3±2.4 ml/min/kg (mean ± 95% CI) lower than IC V̇O2max (p=0.013, SCG V̇O2max: 36.6±3.3 ml/min/kg, IC V̇O2max: 39.9±3.0 ml/min/kg). A significant positive correlation was found between SCG V̇O2max and IC V̇O2max (Pearson, r=0.72, p<0.001). Both SCG V̇O2max and IC V̇O2max was similar between test days (p=0.972) and the intra-individual coefficient of variation was 4.5±2.9% and 4.0±2.5%, respectively. Within each test day SCG V̇O2max was highly correlated (r=0.99, p<0.0001) and no difference was observed between tests (p=0.993). Conclusions The accuracy of the current non-exercise assessment of cardiorespiratory fitness based on seismocardiography is not optimal as SCG V̇O2max was systematically lower than the gold standard assessment applying indirect calorimetry during a graded exercise test. Despite the abovementioned difference, SCG V̇O2max and IC V̇O2max were highly correlated. Furthermore, the precision of SCG V̇O2max is very high as both day-to-day and test-retest reliability were high. FUNDunding Acknowledgement Type of funding sources: Private company. Main funding source(s): VentriJect ApS, Copenhagen, Denmark

Download Full-text

Reliability of the performance-based measure of executive functions in people with schizophrenia

BMC Psychiatry ◽

10.1186/s12888-021-03562-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

En-Chi Chiu ◽

Ya-Chen Lee ◽

Shu-Chun Lee ◽

I-Ping Hsueh

Keyword(s):

Executive Functions ◽

Intraclass Correlation ◽

Minimal Detectable Change ◽

Rater Agreement ◽

Good Test ◽

Retest Reliability ◽

Purposive Action ◽

And Performance ◽

Agreement Study ◽

Test Retest Reliability

Abstract Background The Performance-based measure of Executive Functions (PEF) with four domains is designed to assess executive functions in people with schizophrenia. The purpose of this study was to examine the test-retest reliability of the PEF administered by the same rater (intra-rater agreement) and by different raters (inter-rater agreement) in people with schizophrenia and to estimate the values of minimal detectable change (MDC) and MDC%. Methods Two convenience samples (each sample, n = 60) with schizophrenia were conducted two assessments (two weeks apart). The intraclass correlation coefficient (ICC) was analyzed to examine intra-rater and inter-rater agreements of the test-retest reliability of the PEF. The MDC was calculated through standard error of measurement. Results For the intra-rater agreement study, the ICC values of the four domains were 0.88–0.92. The MDC (MDC%) of the four domains (volition, planning, purposive action, and perfromance effective) were 13.0 (13.0%), 12.2 (16.4%), 16.2 (16.2%), and 16.3 (18.8%), respectively. For the inter-rater agreement study, the ICC values of the four domains were 0.82–0.89. The MDC (MDC%) were 15.8 (15.8%), 17.4 (20.0%), 20.9 (20.9%), and 18.6 (18.6%) for the volition, planning, purposive action, and performance effective domains, respectively. Conclusions The PEF has good test-retest reliability, including intra-rater and inter-rater agreements, for people with schizophrenia. Clinicians and researchers can use the MDC values to verify whether an individual with schizophrenia shows any real change (improvement or deterioration) between repeated PEF assessments by the same or different raters.

Download Full-text