The standard error in the Jacobson and Truax Reliable Change Index: 
The classical approach to the assessment of reliable change

Researchers and clinicians using Jacobson and Truax's index to assess the reliability of change in patients, or its counterpart by Chelune et al., which takes practice effects into account, are confused by the different ways of calculating the standard error encountered in the literature (see the discussion started in this journal by Hinton-Bayre). This article compares the characteristics of (1) the standard error used by Jacobson and Truax, (2) the standard error of difference scores used by Temkin et al. and (3) an adaptation of Jacobson and Truax's approach that accounts for difference between initial and final variance. It is theoretically demonstrated that the last variant is preferable, which is corroborated by real data. (JINS, 2004, 10, 888–893.)

Download Full-text

Standard error in the Jacobson and Truax Reliable Change Index: The “classical approach” leads to poor estimates

Journal of the International Neuropsychological Society ◽

10.1017/s1355617704106115 ◽

2004 ◽

Vol 10 (6) ◽

pp. 899-901 ◽

Cited By ~ 14

Author(s):

NANCY R. TEMKIN

Keyword(s):

Measurement Error ◽

Classical Theory ◽

Error Rate ◽

Standard Error ◽

Practice Effect ◽

Practice Effects ◽

Reliable Change Index ◽

Classical Approach ◽

Reliable Change

Different authors have used different estimates of variability in the denominator of the Reliable Change Index (RCI). Maassen attempts to clarify some of the differences and the assumptions underlying them. In particular he compares the ‘classical’ approach using an estimate SEd supposedly based on measurement error alone with an estimate SDiff based on the variability of observed differences in a population that should have no true change. Maassen concludes that not only is SEd based on classical theory, but it properly estimates variability due to measurement error and practice effect while SDiff overestimates variability by accounting twice for the variability due to practice. Simulations show Maassen to be wrong on both accounts. With an error rate nominally set to 10%, RCI estimates using SDiff wrongly declare change in 10.4% and 9.4% of simulated cases without true change while estimates using SEd wrongly declare change in 17.5% and 12.3% of the simulated cases (p < .000000001 and p < .008, respectively). In the simulation that separates measurement error and practice effects, SEd estimates the variability of change due to measurement error to be .34, when the true variability due to measurement error was .014. Neuropsychologists should not use SEd in the denominator of the RCI. (JINS, 2004, 10, 899–901.)

Download Full-text

The Two Errors of Using the Within-Subject Standard Deviation (WSD) as the Standard Error of a Reliable Change Index

Archives of Clinical Neuropsychology ◽

10.1093/arclin/acq036 ◽

2010 ◽

Vol 25 (5) ◽

pp. 451-456 ◽

Cited By ~ 5

Author(s):

G. H. Maassen

Keyword(s):

Standard Deviation ◽

Standard Error ◽

Reliable Change Index ◽

Reliable Change

Download Full-text

Reliable Change Formula Query: Temkin et al. reply

Journal of the International Neuropsychological Society ◽

10.1017/s135561770063312x ◽

2000 ◽

Vol 6 (3) ◽

pp. 364-364 ◽

Cited By ~ 4

Author(s):

NANCY R. TEMKIN ◽

ROBERT K. HEATON ◽

IGOR GRANT ◽

SUREYYA S. DIKMEN

Keyword(s):

Standard Deviation ◽

Standard Error ◽

Direct Method ◽

Single Point ◽

Reliability Coefficient ◽

Reliable Change Index ◽

Square Root ◽

Reliable Change ◽

The Difference ◽

The One

Hinton-Bayre (2000) raises a point that may occur to many readers who are familiar with the Reliable Change Index (RCI). In our previous paper comparing four models for detecting significant change in neuropsychological performance (Temkin et al., 1999), we used a formula for calculating Sdiff, the measure of variability for the test–retest difference, that differs from the one Hinton-Bayre has seen employed in other studies of the RCI. In fact, there are two ways of calculating Sdiff—a direct method and an approximate method. As stated by Jacobson and Truax (1991, p. 14), the direct method is to compute “the standard error of the difference between the two test scores” or equivalently [begin square root](s12 + s22 − 2s1s2rxx′)[end square root] where si is the standard deviation at time i and rxx′ is the test–retest correlation or reliability coefficient. Jacobson and Truax also provide a formula for the approximation of Sdiff when one does not have access to retest data on the population of interest, but does have a test–retest reliability coefficient and an estimate of the cross-sectional standard deviation, i.e., the standard deviation at a single point in time. This approximation assumes that the standard deviations at Time 1 and Time 2 are equal, which may be close to true in many cases. Since we had the longitudinal data to directly calculate the standard error of the difference between scores at Time 1 and Time 2, we used the direct method. Which method is preferable? When the needed data are available, it is the one we used.

Download Full-text

When (Not) to Rely on the Reliable Change Index

10.31219/osf.io/3kthg ◽

2021 ◽

Author(s):

Andrew Athan McAleavey

Keyword(s):

Measurement Error ◽

Clinical Psychology ◽

False Positives ◽

Reliable Change Index ◽

Applied Psychology ◽

False Negatives ◽

Statistical Tool ◽

Reliable Change ◽

Difference Scores

The reliable change index (RCI) is a widely used statistical tool designed to account for measurement error when evaluating difference scores. Because of its conceptual simplicity and computational ease, it persists in research and applied psychology. However, researchers have repeatedly demonstrated ways that the RCI is insufficient or invalid for various applications. This is a problem in research and clinical psychology since this common tool is potentially problematic. The aims of this manuscript are to non-technically describe the formulation and assumptions of the RCI, to offer guidance as to when the RCI is (and is not) appropriate, and to identify what is needed for proper calculation of the RCI when it is used. Several criteria are identified to help determine whether the RCI is appropriate for a specific use. It is apparent that the RCI is the best available method only in a small number of situations, is frequently miscalculated, and produces incorrect inferences more often than simple alternatives, largely because it is highly insensitive to real changes. Specific alternatives are offered which may better operationalize common inferential tasks, including when more than two observations are available and when false negatives are equally costly to false positives.

Download Full-text

Application of Different Standard Error Estimates in Reliable Change Methods

Archives of Clinical Neuropsychology ◽

10.1093/arclin/acz054 ◽

2019 ◽

Cited By ~ 2

Author(s):

Dustin B Hammers ◽

Kevin Duff

Keyword(s):

Standard Error ◽

Clinical Sample ◽

Cognitive Change ◽

Ease Of Use ◽

Practice Effects ◽

Strongly Correlated ◽

Reliable Change ◽

Z Scores ◽

Mathematical Accuracy ◽

The Impact

Abstract Objective This study attempted to clarify the applicability of standard error (SE) terms in clinical research when examining the impact of short-term practice effects on cognitive performance via reliable change methodology. Method This study compared McSweeney’s SE of the estimate (SEest) to Crawford and Howell’s SE for prediction of the regression (SEpred) using a developmental sample of 167 participants with either normal cognition or mild cognitive impairment (MCI) assessed twice over 1 week. One-week practice effects in older adults: Tools for assessing cognitive change. Using these SEs, previously published standardized regression-based (SRB) reliable change prediction equations were then applied to an independent sample of 143 participants with MCI. Results This clinical developmental sample yielded nearly identical SE values (e.g., 3.697 vs. 3.719 for HVLT-R Total Recall SEest and SEpred, respectively), and the resultant SRB-based discrepancy z scores were comparable and strongly correlated (r = 1.0, p < .001). Consequently, observed follow-up scores for our sample with MCI were consistently below expectation compared to predictions based on Duff’s SRB algorithms. Conclusions These results appear to replicate and extend previous work showing that the calculation of the SEest and SEpred from a clinical sample of cognitively intact and MCI participants yields similar values and can be incorporated into SRB reliable change statistics with comparable results. As a result, neuropsychologists utilizing reliable change methods in research investigation (or clinical practice) should carefully balance mathematical accuracy and ease of use, among other factors, when determining which SE metric to use.

Download Full-text

Estimating Responders to Treatment Using Five Indices of Significant Individual Change

10.21203/rs.3.rs-365851/v1 ◽

2021 ◽

Author(s):

Ron D. Hays ◽

Mary E. Slaughter ◽

Karen L. Spritzer ◽

Patricia M. Herman

Keyword(s):

Low Back Pain ◽

Back Pain ◽

Standard Deviation ◽

Standard Error ◽

Reliable Change Index ◽

Supplementary Information ◽

Low Back ◽

Individual Change ◽

Reliable Change ◽

Deviation Index

Abstract Background: Identifying how many individuals significantly improve (“responders”) provides important supplementary information beyond group mean change about the effects of treatment options. This supplemental information can enhance interpretation of clinical trials and observation studies. This study provides a comparison of five ways of estimating the significance of individual change.Methods: Secondary analyses of the Impact Stratification Score (ISS) for chronic low back pain which was administered at two timepoints in two samples: 1) three months apart in an observational study of 1,680 patients undergoing chiropractic care; and 2) 6 weeks apart in a randomized trial of 720 active-duty military personnel with low back pain. The ISS is the sum of the PROMIS-29 v2.1 physical function, pain interference and pain intensity scores and has a possible range of 8 (least impact) to 50 (greatest impact). The five methods of evaluating individual change compared were: 1) standard deviation index; 2) standard error of measurement (SEM); 3) standard error of estimate; 4) standard error of prediction; and 5) reliable change index.Results: Internal consistency reliability of the ISS at baseline was 0.90 in Sample 1 and 0.92 in Sample 2. Effect size of change on the ISS was -0.16 in Sample 1 and -0.59 in Sample 2. The denominators for the five methods in Sample 1 (Sample 2) were 7.6 (8.4) for the standard deviation index, 2.4 (2.4) for the SEM, 2.3 (2.3) for the standard error of estimation, and 3.3 (3.4) for the standard error of prediction and the reliable change index. The amount of change on the ISS needed for significant individual change in both samples was about 15-16 for the standard deviation index, 5 for the SEM and for the standard error of estimation, and 7 for the standard error of prediction and reliable change index. The percentage of people classified as responders ranged from 1% (standard deviation index in Sample 1) to 57% (SEM and standard error of estimate in Sample 2).Conclusions: The standard error of prediction and reliable change index estimates of significant change are consistent with retrospective ratings of change of at least moderately better in prior research. These two are less likely than other methods to classify people as responders who have not actually gotten better.

Download Full-text

What do Temkin's simulations of reliable change tell us?

Journal of the International Neuropsychological Society ◽

10.1017/s1355617704106127 ◽

2004 ◽

Vol 10 (6) ◽

pp. 902-903

Author(s):

GERARD H. MAASSEN

Keyword(s):

Standard Error ◽

Classical Test Theory ◽

Test Theory ◽

Practice Effects ◽

Reliable Change ◽

Historical Aspect ◽

Classical Test ◽

Classic Approach ◽

Post Hoc

Due to space limitations I have chosen to confine my reply to the comments by Temkin (this issue, pp. 899–901) that touch most directly the concepts of practice effects and reliable change. Temkin seems to portray my adherence to the classic approach as a private affair. However, Temkin herself (Temkin et al., 1999) reported to utilize the most widely applied procedures of Jacobson and Truax and of Chelune et al., which are based on the classic approach. For unexplained reasons they had substituted a different standard error. The unsatisfactory justification later given in their reply to Hinton-Bayre's (2000) letter revealed the presumably actual reason: unfamiliarity with psychometrics including the classical test theory (CTT). Not surprisingly, Temkin ignores this historical aspect in her comment. Nevertheless, the new post-hoc arguments she brings up deserve, of course, a fair evaluation.

Download Full-text

Estimating Responders to Chiropractic Treatment on The Impact Stratification Score Using Five Indices of Significant Individual Change

10.21203/rs.3.rs-154097/v1 ◽

2021 ◽

Author(s):

Ron D. Hays ◽

Mary E. Slaughter ◽

Patricia M. Herman

Keyword(s):

Standard Deviation ◽

Confidence Interval ◽

Standard Error ◽

Chiropractic Care ◽

Reliable Change Index ◽

Supplementary Information ◽

Individual Change ◽

Reliable Change ◽

Deviation Index ◽

The Impact

Abstract Background: Identifying how many individuals significantly improve (“responders”) provides important supplementary information beyond group mean change about the effects of treatment options. This supplemental information can enhance interpretation of clinical trials and observation studies. This study provides a comparison of five ways of estimating the significance of individual change.Methods: Secondary analyses of the Impact Stratification Scale (ISS) selected for chronic low back pain was administered at two timepoints three months apart in an observational study of 1,680 patients undergoing chiropractic care. The (ISS is the sum of the PROMIS-29 v2,1 physical function, pain interference and pain intensity scores and has a possible range of 8 (least impact) to 50 (greatest impact). The five methods of evaluating individual change compared were: 1) standard deviation index; 2) confidence interval around the standard error of measurement (SEM); 3) standard error of estimate; 4) standard error of prediction; and 5) reliable change index.Results: Internal consistency reliability of the ISS at baseline was 0.90. Effect size of change on the ISS was -0.16 using the SD (7.6) at baseline. The denominators for the five methods were 7.6 for the standard deviation index, 2.4 for the confidence interval around the SEM, 2.3 for the standard error of estimation, and 3.3 for the standard error of prediction and the reliable change index. The amount of change on the ISS needed for significant individual change was 15 for the standard deviation index, 5 for the confidence interval around the SEM and for the standard error of estimation, and 7 for the standard error of prediction and reliable change index. The percentage of people classified as responders ranged from 1% (standard deviation index) to 22% (standard error of prediction and reliable change index).Conclusions: The standard error of prediction and reliable change index estimates of significant change are consistent with retrospective ratings of change of at least moderately better in prior research. These two are less likely than other methods to classify people as responders who have not actually gotten better.

Download Full-text

Die Wirksamkeit eines Frühinterventionsprogramms für Jugendliche mit Computerspiel- und Internetabhängigkeit: Mittelfristige Effekte der PROTECT+ Studie

Zeitschrift für Kinder- und Jugendpsychiatrie und Psychotherapie ◽

10.1024/1422-4917/a000673 ◽

2020 ◽

Vol 48 (1) ◽

pp. 3-14 ◽

Cited By ~ 3

Author(s):

Carolin Szász-Janocha ◽

Eva Vonderlin ◽

Katajun Lindenberg

Keyword(s):

Internet Use ◽

Health Problems ◽

Reliable Change Index ◽

Gaming Disorder ◽

Statistical Classification ◽

Reliable Change ◽

Classification Of Diseases ◽

International Statistical Classification

Zusammenfassung. Fragestellung: Das junge Störungsbild der Computerspiel- und Internetabhängigkeit hat in den vergangenen Jahren in der Forschung zunehmend an Aufmerksamkeit gewonnen. Durch die Aufnahme der „Gaming Disorder“ in die ICD-11 (International Statistical Classification of Diseases and Related Health Problems) wurde die Notwendigkeit von evidenzbasierten und wirksamen Interventionen avanciert. PROTECT+ ist ein kognitiv-verhaltenstherapeutisches Gruppentherapieprogramm für Jugendliche mit Symptomen der Computerspiel- und Internetabhängigkeit. Die vorliegende Studie zielt auf die Evaluation der mittelfristigen Effekte nach 4 Monaten ab. Methodik: N = 54 Patientinnen und Patienten im Alter von 9 bis 19 Jahren (M = 13.48; SD = 1.72) nahmen an der Frühinterventionsstudie zwischen April 2016 und Dezember 2017 in Heidelberg teil. Die Symptomschwere wurde zu Beginn, zum Abschluss der Gruppentherapie sowie nach 4 Monaten anhand von standardisierten Diagnostikinstrumenten erfasst. Ergebnisse: Mehrebenenanalysen zeigten eine signifikante Reduktion der Symptomschwere anhand der Computerspielabhängigkeitsskala (CSAS) nach 4 Monaten. Im Selbstbeurteilungsbogen zeigte sich ein kleiner Effekt (d = 0.35), im Elternurteil ein mittlerer Effekt (d = 0.77). Der Reliable Change Index, der anhand der Compulsive Internet Use Scale (CIUS) berechnet wurde, deutete auf eine starke Heterogenität im individuellen Symptomverlauf hin. Die Patientinnen und Patienten bewerteten das Programm zu beiden Follow-Up-Messzeitpunkten mit einer hohen Zufriedenheit. Schlussfolgerungen: Die vorliegende Arbeit stellt international eine der wenigen Studien dar, die eine Reduktion der Symptome von Computerspiel- und Internetabhängigkeit im Jugendalter über 4 Monate belegen konnte.

Download Full-text

Empirical methods for assessing meaningful neuropsychological change following epilepsy surgery

Journal of the International Neuropsychological Society ◽

10.1017/s1355617700001739 ◽

1996 ◽

Vol 2 (6) ◽

pp. 556-564 ◽

Cited By ~ 158

Author(s):

Stephen M. Sawrie ◽

Gordon J. Chelune ◽

Richard I. Naugle ◽

Hans O. Lüders

Keyword(s):

Epilepsy Surgery ◽

Intractable Epilepsy ◽

Practice Effects ◽

Cutoff Score ◽

Regression To The Mean ◽

Neuropsychological Battery ◽

Reliable Change ◽

Medical Interventions ◽

The Mean ◽

Test Retest Reliability

AbstractTraditional methods for assessing the neurocognitive effects of epilepsy surgery are confounded by practice effects, test-retest reliability issues, and regression to the mean. This study employs 2 methods for assessing individual change that allow direct comparison of changes across both individuals and test measures. Fifty-one medically intractable epilepsy patients completed a comprehensive neuropsychological battery twice, approximately 8 months apart, prior to any invasive monitoring or surgical intervention. First, a Reliable Change (RC) index score was computed for each test score to take into account the reliability of that measure, and a cutoff score was empirically derived to establish the limits of statistically reliable change. These indices were subsequently adjusted for expected practice effects. The second approach used a regression technique to establish “change norms” along a common metric that models both expected practice effects and regression to the mean. The RC index scores provide the clinician with a statistical means of determining whether a patient's retest performance is “significantly” changed from baseline. The regression norms for change allow the clinician to evaluate the magnitude of a given patient's change on 1 or more variables along a common metric that takes into account the reliability and stability of each test measure. Case data illustrate how these methods provide an empirically grounded means for evaluating neurocognitive outcomes following medical interventions such as epilepsy surgery. (JINS, 1996, 2, 556–564.)

Download Full-text